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Note added for cond-mat: We show how various concepts from statistical physics, such 
as order parameter, thermodynamic hmit, and quantum phase transition, translate into 
corresponding biological concepts in mutation-selection models for sequence evolution and 
can be used in this context. The article takes a biological point of view and works in a 
population genetics framework, but contains an appendix especially written for physicists, 
which makes this correspondence clear. 



Abstract 

We analyze the equilibrium behavior of deterministic haploid mutation-selection models. 
To this end, both the forward and the time-reversed evolution processes are considered. The 
stationary state of the latter is called the ancestral distribution, which turns out as a key 
for the study of mutation-selection balance. We find that the ancestral genotype frequencies 
determine the sensitivity of the equilibrium mean fitness to changes in the corresponding 
fitness values and discuss implications for the evolution of mutational robustness. We further 
show that the difference between the ancestral and the population mean fitness, termed 
mutational loss, provides a measure for the sensitivity of the equilibrium mean fitness to 
changes in the mutation rate. The interrelation of the loss and the mutation load is discussed. 
For a class of models in which the number of mutations in an individual is taken as the 
trait value, and fitness is a function of the trait, we use the ancestor formulation to derive 
a simple maximum principle, from which the mean and variance of fitness and the trait 
may be derived; the results are exact for a number of limiting cases, and otherwise yield 
approximations which are accurate for a wide range of parameters. These results are applied 
to threshold phenomena caused by the interplay of selection and mutation (known as error 
thresholds). They lead to a clarification of concepts, as well as criteria for the existence of 
error thresholds. 



1 Introduction 

A lot of research in theoretical population genetics has been directed towards mutation-selection 
models in multilocus systems and infinite populations. One is usually interested in statistical 
properties of the equilibrium distribution of genotypes, like the means and variances of fitness 
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and trait (s). The standard approach to determine these starts out from the equihbrium condition 
for the genotype frequencies (which takes the form of an eigenvalue equation if the population is 
haploid). On this basis, a wealth of methods and results has been created; for a comprehensive 
and up-to-date account, see IBiirger (20001) . 

In this article, we present an alternative route, which relies on a time-reversed version of 
the mutation-selection process and its stationary distribution - to be called the ancestral dis- 
tribution, as opposed to the equilibrium distribution of the forward process. We will apply this 
approach to tackle a rather general class of models for haploids, or diploids without dominance. 
It is only assumed that fitness is a function of a trait, and genotypes with equal trait values 
have equivalent mutation patterns. In fact, this is a standard assumption, and is often implied 
without special mention. It applies, for example, if (geno)types are identified with a collection 
of loci with two alleles each (wildtype and mutant), which mutate independently and accord- 
ing to the same process, and the number of (deleterious) mutations plays the role of the trait. 
The assumption of permutation invariance (with respect to the loci) is certainly a distortion of 
biological reality, but, even in this simplified setting, general answers have previously been con- 
sidered impossible ( Charlesworth, 199C| ), and researchers have resorted to more specific choices of 
the fitness function and the mutation model (e.g. quadratic fitness functions and unidirectional 
mutation) . 

With the help of the ancestral distribution, we will be able to tackle general fitness functions 
(with arbitrary epistasis), as well as general mutation schemes (including arbitrary amounts of 
back mutation), from the permutation invariant class. The mutation-selection equilibrium will 
be characterized through a maximum principle which relates the equilibrium population to the 
ancestral one, and may be evaluated explicitly to yield expressions for the mean fitness and the 
mean trait value, as well as the variances of these quantities. The results are exact for a number 
of limiting cases, and otherwise yield approximations which are accurate for a wide range of 
parameters. 

The results will then be used to settle the long-standing issue of characterization and classi- 
fication of error threshold phenomena in this model class. An error threshold may be generally, 
but vaguely, circumscribed as a critical mutation rate beyond which mutation can no longer be 
controlled by selection and leads to genetic degeneration; for review, see [Eigen et al. (1989| ). 
Some, but by no means all, mutation-selection models display such behavior. It turns out that 
a consistent definition of an error threshold is rather subtle and must be sorted out first. On 
this basis, we will classify mutation-selection models according to their threshold behavior (if 
any). 

Since the article treats quite a number of topics, we start out with a brief reader's guide to 
the main results here and give hints on what parts can be skipped at a first reading. Let us also 
mention that Table | contains a glossary of repeatedly used notation. 

The scene is set in Section ^, where we will introduce the model (the continuous-time 
mutation-selection model) and establish its relationship with a multitype branching process. 
Two concepts that are central to this paper, the ancestor distribution and the mutation class 
limit, are developed in this section. Section |2.2| introduces the ancestor distribution as the sta- 
tionary distribution of the time-reversed branching process and links the algebraic properties of 
the model to a probabilistic picture that also helps to shape biological intuition. In order to 
allow quick progress to the results in the remainder of the article, however, we have summarized 



the main points in Section |2.3| . In |2.4| , the means and variances of the trait and of fitness with 
respect to the equilibrium population and with respect to the ancestors are introduced. In 2.5, 
the difference between the ancestral and the population mean fitness, termed mutational loss, 
is shown to provide a measure for the sensitivity of the equilibrium mean fitness to changes in 
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the mutation rate. This result is used and expanded in some of the apphcations in Sections ^ 
and 1^ but can be skipped at first reading. Sections |2.6| and 2.7 are mainly concerned with the 
mutation class limit, along with the proper scaling of fitness functions and mutation schemes. 
Like the well-known infinite-sites limit, this limit assumes an infinite number of types, but uses 
a different scaling. As a consequence, it is valid if the total mutation rate is large relative to 
typical fitness differences of types. In this paper, the mutation class limit is used to derive 
analytic expressions for means and variances of fitness and the trait for the general case with 
back mutations and a non-linear fitness function. It is also crucial for our discussion of threshold 
behavior in Section ^. 

Section ^ is a condensed summary of the main results related to the maximum principle. 
The mean fitness at mutation-selection balance equals the maximum of the difference between 
the fitness function and a so-called mutational loss function, where the position of the maximum 
determines the mean ancestral trait. Once these means are known, explicit expressions are 
available for the mean trait and the variances of fitness and trait. The derivations are postponed 
to Section which may be skipped at first reading. 

The following two sections are devoted to applications. Both are, to a large extent, indepen- 
dent of each other and rely only on the matter introduced in Sections |2| and |3[ In Section ^ we 
first discuss the evolutionary significance of the mutational loss, and then turn to the mutation 
load. Explicit expressions are derived for small (back) mutation rates; but arbitrary mutation 
rates are covered by the maximum principle, which may be interpreted as a generalized version 
of Haldane's principle. Consequences for the evolution of mutational effects and for mutational 
robustness are discussed. Finally a note is added as to the accuracy of the expressions for the 
means and variances. 

In Section ^, we first analyze the definitions available for the error threshold. It will turn 
out that various notions must be distinguished, which coincide only in special cases. For each 
of these notions, a criterion for the existence of an error threshold is derived from the maximum 
principle. Furthermore, the phenomena are illustrated by means of examples and discussed with 
respect to their biological implications. Section provides a summary and an outlook. 

Appendix |A| describes the connection between our mutation-selection model and a system of 
quantum-statistical mechanics, which had been used previously (e.g. Baake et al, 199^ ; Baake| 
and Wagner, 2001 ) to solve a more restricted model class, and which also served as the source of 
concepts and methods for the current article. However, the body of the paper does not require 
any knowledge of physics and remains entirely within the framework of population genetics and 
classical probability theory. Appendices ^ and |y, finally, contain the proofs from Sections |^ and 
g, respectively. 



2 Model setup 
2.1 The model 

We consider the evolution of an infinite population of haploid individuals (or diploids with- 
out dominance) under mutation and selection. Disregarding environmental effects, we take 
individuals to be fully described by their genotypes, which are labeled by the elements of 
{!,..., M}. Let pi{t) be the relative frequency of type i at time t, so that J2iPi(.^) — 1' 
and let p(t) = {pi{t), . . . ,pM{t))'^ with T denoting transposition. Throughout this article we 
will use the formalism for overlapping generations, which works in continuous time, and only 
comment on extensions to the analogous model for discrete generations. The standard system 
of differential equations which describes the evolution of the vector p{t) is ( Prow and Kimura , 
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TABLE I. Glossary of repeatedly used notation. Symbols are given together with the section 
or equation in which they are defined. Symbols whose scope is only a single section are not 
shown. 
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ancestor frequencies 
mutational loss 
mutational loss function 
evolution matrix 
identity matrix 
genotype/class labels 
mutation load 
mutation rates 
mutation matrix 
number of mutation classes / 
sequence length 
population frequencies 
generator of reversed process 
reproduction rates 
fitness function 
reproduction matrix 
mutational effects 
(binary) sequence 
time evolution matrix 
time 

genomic mutation rates 
mutation functions 
variances 

mutational distance 
mutation classes 
arbitrary trait 

relative reproductive success 
overall factor for reproduction 
rates 

biallelic mutation asymmetry 
parameter 

largest eigenvalue of H 
overall mutation rate 



1970; 


Hofbauei 




1985; see also Biirgei 




2000) 



Vi(t) = {R, - R(t)\vi{t) + ^KjPj(t) - mjiVi(t)\ . (1) 

3 

Here, Ri is the Malthusian fitness of type i, which is connected to the respective birth and 
death rates as Ri = Bi — Di, and R{t) = Yli^iPii^) designates the mean fitness. Further, ruij 
is the rate at which a j individual mutates to i, and the dot denotes the time derivative. In 
this model, mutation and selection are assumed to be independent processes which go on in 
parallel. However, mutation may also be viewed as occuring during reproduction. In this case. 
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^ |i(l+K) 
^l(l-K) 

FIG. 1. Rates for mutations and back mutations at each site or locus of a biallelic sequence. 

the mutation rate is given by rriij := VijBj, where Vij is the respective mutation probability 
during a reproduction event. Since, formally, this leads to the same model, it need not be 
discussed further. 

For some of the main results of this article, further assumptions on the mutation scheme are 
required. To this end, we collect genotypes into classes of equal fitness, < k < N, and 
assume mutations only to occur between neighboring classes. Let Rk denote the fitness of class 
k and the mutation rate from class to Xk±i the total rate for each genotype in X/^ to 
mutate to some genotype in class Xk±i), with the convention Uq = = 0. Thus, we obtain a 
variant of the so-called single-step mutation model: 

p,it) = [R, - R{t) - U+ - U.jp.it) + U+^,p,_,{t) + . (2) 

(Here, the convention p-i{t) = P]\f^i{t) = is used.) We can, for example, think of Xq as the 
wildtype class with maximum fitness and fitness only depending on the number of mutations 
carried by an individual. If, further, mutation is modeled as a continuous point process (or 
if multiple mutations during reproduction can be ignored), Eq. (||) reduces to (|2|), with an 
appropriate choice of mutation classes. Depending on the realization one has in mind, the Uk 
then describe the total mutation rate affecting the whole genome or just some trait or function. 

In most of our examples, we will use the Hamming graph as our genotype space. Here, 
genotypes are represented as binary sequences s = siS2 ... sat G {+, — thus M = 2^ . The 
two possible values at each site, -|- and — , may be understood either in a molecular context 
as nucleotides (purines and pyrimidines) or, on a coarser level, as wildtype and mutant alleles 
of a biallelic multilocus model. We will assume equal mutation rates at all sites, but allow 
for different rates, ^(1 + k) and ^(1 — k), for mutations from + to — and for back mutations, 
respectively, according to the scheme depicted in Fig. ||. 

Clearly, the biallelic model reduces to a single-step mutation model (with the same A^) if the 
fitness landscape]^ is invariant under permutation of sites. To this end, we distinguish a reference 
genotype s+ = + + ...+, in most cases the wildtype or master sequence, and assume that the 
fitness Rg of sequence s depends only on the Hamming distance k = (iH(s,s_|_) to s+ (i.e. the 
number of mutations, or '— ' signs in the sequence). The resulting total mutation rates between 
the Hamming classes X^ and Xk±i read 

U+ = ^{l + k){N - k) and = - K)k (3) 

if mutation is assumed to be an independent point process at all sites. We usually have the 
situation in mind in which fitness decreases with k and will therefore speak of C/^ and as 
the deleterious and advantageous mutation rates. However, monotonic fitness is never assumed, 
unless this is stated explicitly. 

In much of the following, we will treat the general model (||), which builds on single genotypes, 
and the single-step mutation model (|2|), in which the units are genotype classes, with the help 
of a common formalism. To this end note that both models can be recast into the following 



^We use the notion of a fitness landscape (Kauffman and Levin, 1987) as synonymous with fitness function for 



the mapping from genotypes to individual fitness values. 
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general form using matrices of dimension M, respectively N + 1: 



Pit) = (H - R{t)I)p{t) . (4) 

Here, I is the identity. The evolution matrix H = R + M is composed of a diagonal matrix 
R that holds the Malthusian fitness values, and the mutation matrix M = (Mij) with either 
off-diagonal entries reading rriij, or with on the secondary diagonals. The diagonal elements 
in each case are = — ^j^i Mji, hence the column sums vanish. Where the more restrictive 
form of the single-step model is needed, this will be stated explicitly. Unless we talk about 
unidirectional mutation [Uj^ = for the single-step mutation model), we will always assume 
that M is irreducible (i.e. each entry is non-zero for a suitable power of M). 

Let now T(f) := exp(fH), with matrix elements Tij{t). Then, the solution of (§) is given by 
(see, e.g., [Biirger, 2000| , Ch. III.l) 



as can easily be established by differentiating and using J2i j ^ijPji^) — J2i^iPii^) — R{t)- 
Due to irreducibility, the population vector converges to a unique, globally stable equilibrium 
distribution p := limj^ooP(i) with pi > for all i, which describes mutation-selection balance. 
By the Perron-Frobenius theorem, p is the (right) eigenvector corresponding to the largest 
eigenvalue, Amax, of H. 

2.2 The branching process — forward and backward in time 

Our approach will heavily rely on genealogical relationships, which contain more detailed infor- 
mation than the time course of the relative frequencies (||) alone. Let us, therefore, reconsider 
the mutation-selection model as a branching process. Branching processes have been a classical 
tool in population genetics to approximate the fixation rates of a single mutant type in a finite 
population. This approach goes back to Haldane (1927D (see also Grow and Kimura, 1970|), an d 



has been used in many recent applications as well (e.g. [Barton, 1995 ; Otto and Barton, 1997 ). 



We pursue a different route here by considering the process of mutation, reproduction and 
death as a (continuous-time) multitype branching process, as described previously for the quasis- 
pecies model (Demetrius et al, 1985| ; Hofbauer and Sigmund 198^ , Ch. 11.5). Let us start with a 



finite population of individuals, which reproduce (at rates Bi), die (at rates Di), or change type 
(at rates Mij) independently of each other, without any restriction on population size. Let Yi{t) 
be the random variable denoting the number of individuals of type i at time t, and ni{t) the cor- 
responding realization; collect the components into vectors Y and n, and let ej be the i-th unit 
vector. The transition probabilities for the joint distribution, Pic(Y{t) = n{t) \ Y{0) = ^(O)), 
which we will abbreviate as Pr(n(t) | n(0)) by abuse of notation, are governed by the differential 
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t+x 



t+x 



FIG. 2. The multitype branching process. Individuals reproduce (branching hues), die (ending hnes), or 
mutate (hnes changing type) independently of each other; the various types are indicated by different line 
styles. Left: The fat lines mark the clone founded by a single individual (bullet) at time t. Right: The 
fat lines mark the lines of descent defined by three individuals (bullets) at time t + t. After coalescence 
of two lines, their ancestor receives twice the 'weight', as indicated by extra fat lines. 



equation^ 



- Pr(n(t) I n(0)) = - (^{Bi + D, + Y1 Pr(n(t) | n(0)) 

+ ^B,(ni(t) - 1) Pr(n(t) - | n(0)) 

i 

+ ^ A (riiit) + 1) Pr (n(t) + | n(0)) 

i 

+ J2Mij{nj{t) + 1) Pr(n(t) - e, + ej \ n(0)) . 



(6) 



The connection of this stochastic process with the deterministic model described in Section 
2.1 is twofold. Firstly, in the limit of an infinite number of individuals (n := X^j?^i(0) oo), 
the sequence of random variables Y^'^\t)/n converges almost surely to the solution y{t) of 



y = Hy with initial condition y{0) = n{0)/n ( [Ethier and Kurtz, 1986| , Ch. 11, Thm. 2.1). 
That is, Pr(lim„^oo Y^^\t)/n = y{t)) = 1, and the superscript (n) denotes the dependence on 
the number of individuals. The connection is now clear since p{t) := y{t)/ Y^iVii^) solves the 
mutation-selection equation (|^). 

Secondly, taking expectations of Yi and marginalizing over all other variables, one obtains 
the differential equation for the conditional expectations 



di 



E{Yi{t) I n(0)) ={Bi - Di)E{Y,{t) \ n(0)) 



+ Y,[M,jE{Yj{t) I n(0)) - MjiE{Yi{t) \ n(0))] . 



(7) 



Clearly, our evolution matrix H appears as the infinitesimal generator here, and the solution is 



given by T(t)n(0), where T(t) := exp(iH) (see also [Hofbauer and Sigmund, 198q , Ch. 11.5). In 
particular, we have E{Yi{t) \ e^) = Tij{t) for the expected number of i-individuals at time t, in a 
population started by a single j-individual at time (a 'j-clone'). In the same way, Tij{T) is the 

■^Note that differentiability of the transition pro babilities is guaranteed in a finite-state , continuous-time Markov 



chain, provided the transition rates are finite, cf Karlin and Taylor (1975, Ch. 4) and Karlin and Taylor (1981 
Ch. 14). 



expected number of descendants of type i at time t + r in a j-clone started at an arbitrary time 
t, cf left panel of Fig. ^. (Note that, due to the independence of individuals and the Markov 
property, the progeny distribution depends only on the age of the clone, and on the founder 
type.) Further, the expected total size of a j-clone of age r, irrespective of the descendants' 
types, is Y^i^ijir). 

Initial conditions come into play if we consider the reproductive success of a clone relative 
to the whole population. A population of independent individuals, with initial composition p{t), 
has expected mean clone size Yli jTij{T)pj{t) at time t + r (note that t always means 'absolute' 
time, whereas r denotes a time increment). The expected size of a single j-clone at time t + r, 
relative to the expected mean clone size of the whole population, then is 

Zj{T,t) := Y,T^,iT)/Y,TkiiT)pe{t) . (8) 
i k,e 

The Zj express the expected relative success of a type after evolution for a time interval r, in 
the sense that, if Zj{T,t) > 1 (< 1), we can expect the clone to flourish more (less) than average 
(this does in general not mean that type j is expected to increase (decrease) in abundance 
relative to the initial population). Clearly, the values of the zj depend on the fitness of type 
j, but also on its mutation rate and the fitness of its (mutated) offspring. (If there is only 
mutation, but no reproduction or death, one has a Markov chain and Zj{T,t) = 1.) 

We now consider lines of descent, as in the right panel of Fig. |2[ To this end, we randomly 
pick an individual alive at time t + r, and trace its ancestry back in time; this results in an 
unbranched line (in contrast to the lineage forward in time). Let Zt+rit) denote the type found 
at time t < t + t, where we will drop the index for easier readability. We seek its probability 
distribution Pr(Z(t) = j). Since the (relative) clone size Zj{T,t) also determines the expected 
(relative) frequency of lines present at time t + r that contain a j-type ancestor at time t, we 
have: 

Pr(Z(t) = j) = ZjiT, t)p,{t) =■ a,(T, t) . (9) 

The aj{T,t) define a probability distribution i^ja,j{T,t) = 1), which will be of major im- 
portance, and may be interpreted in two ways. Forward in time, aj{T,t) is the frequency of 
j-individuals at time t, weighted by their relative number of descendants after evolution for some 
time r. Looking backward in time, aj{T,t) is the fraction of the (p-distributed) population at 
time t + T whose ancestor at time t is of type j. We shall therefore refer to a(r, t) as the ancestral 
distribution at the earlier time, t. 

Let us, at this point, expand a little further on this backward picture by explicitly construct- 
ing the time-reversed process. This is done in the usual way, by writing the joint distribution of 
parent -offspring pairs (i.e. pairs Z{t) and Z{t + T)) in terms of forward and backward transition 
probabilities. On the one hand, 

Fi{Z{t + T) = i, Z{t) = j) = Fi{Z{t + T) = i\ Z{t) = j) Pv{Z{t) = j) ^^^^ 

= Pij{T)aj{T,t) . 

Here, the Pijir) := Pr{Z{t + r) = i | Z{t) = j) may be obtained by rewriting the (conditional) 
expectations defining the (forward) branching process as Tij{T) = Pijir) Tkj{T), which gives 

P,,{T) = Ti,{T)/Y,Tkj{r). (11) 

k 
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On the other hand, 



Pr(Z(t + r) = i, Z{t) = j) = Pr{Z{t) = j \ Z{t + t) = i) Pr(Z(t + r) = i) 

where Pji{T,t) := Pr(Z(t) = j \ Z{t + t) = i) is the transition probabihty of the time-reversed 
process and is obtained from (|To|) and ( |T^ ) as Pji{T,t) = aj{T,t)Pij{T)[pi(t + r)) ^. With Eqs. 
, (P) , and (|l^) , one therefore obtains the elements of the backward transition matrix P as 

By differentiating P(r, t) with respect to r and evaluating it at r = 0, one obtains the matrix Q(t) 
governing the corresponding backward process in continuous time. Its elements read Qji{t) = 
^Pji{T,t)\^^Q = Pj{t){Hij - 6ijR{t)) {pi{t))~^ - 5ijPi{t)/pi{t). Using (D this simplifies to: 

[-l.k^tPk{t)Hik{pi{t)) , 1 = 3. 

Note that the backward process is, in general, state-dependent (it does not generate a Markov 
chain). Note also that time reversal works in the same way if sets of types Xk instead of single 
types are considered, as long as mutation and reproduction rates are the same within classes. 
Furthermore, an analogous treatment is possible both for mutation coupled to reproduction, as 
well as for discrete generations. 

As to the asymptotic behavior of our branching process, it is well-known that, for irreducible 
H and t — > 00, the time evolution matrix exp(t(H — AmaxI)) becomes a projector onto the 
equilibrium distribution p, with matrix elements piZj (e.g. Karlin and Taylor, 1981| , Appendix). 



Here, z is the Perron-Frobenius (PF) left eigenvector of H, normalized such that ZiPi = 1. 
As suggested by our notation, one also has 

lim z(r, t) = z , (15) 

which may be seen from We therefore term Zj the relative reproductive success of type i. 

At stationarity, the matrix governing the backward process simplifies to Q^^ = p^i^Hij — 
^ij^ma,x)P7^ 1 which Can now be interpreted as a Markov generator. Further, the (asymptotic) 
ancestor distribution, given by = ZiPi, turns out to be the stationary distribution of the 
backward process, since Y.i Qji^i = T.iPj{Hij - SijX^^JPi^ z^p^ = T,iPjZ,{Hij - 6^jX^^^) = 0. 
Due to ergodicity of the backward process (Q is irreducible if H is), a is, at the same time, the 
distribution of types along each line of descent (with probability 1). 



^Both z and p also admit a more stochastic interpretation. If the population does not go to extinction, one 
has limt^cx) YAt) / (Y^,, Yj(t)] = almost surely, i.e. the stochasticitv is in the population size, not in the relative 
frequencies ( |Kesten and Stigum, 196^ ; Hofbauer and Sigmund, 1988[ , Ch. 11.5). Further, for the critical process 



generated by H - A^axI, one has limt^^o t Pr(Y(t) / | Y(0) = ej) = Zj/C and limt^oo j( E(Yi(t) \ Y(0) 
ej, Y(t) ^ 0) = Cpi, where C is a constant; this is the continuous-time analog of a result by [Jagcrs (I975I , p. 



94). Note that, in the long run, the expected offspring depend on the founder type only through the probability 
of nonextinction of its progeny. 
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FIG. 3. Equilibrium values of population frequencies pk (dotted line), ancestor frequencies Ofc (dashed 
line), and relative reproductive success (solid line) for the biallelic model with additive fitness Rj. = 
7 {N~k) (where 7 is the loss in reproduction rate due to a single mutation), point mutation rate /i = O.27, 
mutation asymmetry parameter k — ^, and sequence length N = 100. The logarithmic right axis refers 
to the Zfe only. 

2.3 The equilibrium ancestor distribution 

As we saw in the last subsection, there is a simple link between the algebraic properties of H 
and the probabilistic structure of the mutation-selection process at equilibrium, which may be 
summarized as follows. The PF right eigenvector p (with YliPi ~ ^) determines the composition 
of the population at mutation-selection balance; the corresponding left eigenvector z (normalized 
so that Yli ^iPi = 1) contains the asymptotic offspring expectation (or relative reproductive 
success) of the various types; and the ancestral distribution, defined by Oj = PiZi, gives the 
asymptotic distribution of types that are met when lines of descent are followed backward in 
time (cf Fig. ^). Fig. ^ shows p, a, and z for a single-step mutation model with a linear fitness 
function. One sees that z decreases exponentially. 

For the single-step mutation model, we may directly transform the eigenvalue equation Hp = 
AmaxP into an equation for a. To this end, we define a diagonal transformation matrix S with 
non-zero elements Skk = 11^=1 \J^7 l^t-i ^"^^ obtain a symmetric matrix by H := SHS~^. 
The corresponding PF right and left eigenvectors are given by p = Sp and z = S^^z. But 
now, as H is symmetric, we have z ~ p (where ~ means proportional to). Hence, due to 
dk = ZkPk = ZkPk ~ Pki one has pk ~ ^/cLk■ Thus, we obtain the following explicit form of the 
eigenvalue equation for H, which will be crucial for the derivation of our main results: 

[Rk - - \/K-lUk^/^l + \JUkUk+l^/^ = ^max^/^- (16) 

Note that Eq. (|l^) relates the mean fitness of the equilibrium population {R = Amax) to the 
ancestor frequencies a^. 

2.4 Observables and averages 

In this subsection we define the observables, i.e. measurable quantities, that are used to describe 
the population on its evolutionary course. Besides the usual population mean, we shall also 
introduce the mean with respect to the ancestor distribution (see Section [2^ ) . 
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We shall consider means and variances of two observables in the following. To this end, we 
characterize each type (or class) i by its fitness value Ri and its mutational distance Xi from the 
reference genotype (or the class Xq). For the biallelic model in particular, mutational distance 
corresponds to the Hamming distance to S-|-. If, in addition, this is the fittest type, Xi just gives 
the number of deleterious mutations. But in general it can also be used to describe the value of 
any additive trait with equal contributions of sites or loci. Similarly, for single-step mutation, 
we define X^ to be the distance from the class Xq, thus X^. = k for class X^. Again, X^ may be 
viewed as (the genetic contribution to) any character with discrete values that depends linearly 
on the mutation classes. 



Population average. Representing an arbitrary observable as (Oj), such as {Ri) or {Xi), we 
will denote its population average as 

0(t):=^0,p,(t). (17) 

i 

By omission of the time dependence we will indicate the corresponding equilibrium average. 
As to mean fitness, R{t) determines the mutation load, L{t) := i?max ~ R{t). Here, R 



max 



maxj Ri is the fitness of the fittest genotype, in line with the usual convention (see, e.g., Ewens, 



1979 ; Biirger, 2000 ). It is well-known that the equilibrium value R := limj^oo R{t) is given by 
the largest eigenvalue, Amaxj of the evolution matrix H. 

For the variance of fitness, Vj^{t) = J^ii^i ~ '^^ differentiate R{t) according to 

(D, i.e. f^R{t) = RiPi{t) = VR{t) + J2i j RiMijPj{t), and hence 

y^(t) = j^R{t) -Y,R^M^JP,{t) = jR{t) + {EiRj - Ri)M,,)p,{t) . (18) 

i,j j ' 

The interpretation of this completely general formula is as follows: In absence of mutation, 
Eq. ([T^ ) just reproduces Fisher's Fundamental Theorem, i.e. the variance in fitness equals the 
change in mean fitness, as long as there is no dominance (see, e.g., [Ewens, 197S| ). If mutation 
is present, however, a second component emerges, which is given by the population mean of 
the mutational effects on fitness, weighted by the corresponding rates. It may be understood as 
the rate of change in mean fitness due to mutation alone. At mutation-selection balance, this 
second term is obviously the only contribution to variance in fitness. 

For the single-step mutation model in particular, we can define deleterious and advantageous 
mutational effects separately as s^ = Rj^—Rj.^^ and = -R^_]^— iZ^, respectively. For decreasing 
fitness values (which is the usual case, but not strictly presupposed here) these are positive. This 
way we obtain 



Vj^ = S+U+ - s-U- = s+U+ - s- U- + Cov(s+, [/+) - Cov(s", C/") (19) 

for the equilibrium variance, a result we will rely on in the following. 

Just as for the fitness distribution, we define the population mean, X{t) = J2iLo ■^iPii't)^ 
and variance, Vj^{t) = Ylii-^i ~ ^(O)^Pi(O) of the mutational distance. 



Ancestral average. We will also need the ancestral average of our observables, that is, the 
average with respect to the ancestral distribution defined in Eq. (^): 0(r, t) := Y^- Oiai{T,t) = 
Yi Zi{T,t)0^p^{t). In the following, we will only be concerned with the ancestral distribution 
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in equilibrium, i.e. with both t and r going to infinity. We obtain the ancestral average of any 
observable (Oi) in this limit as 

^■.= Y,0,a, = Y,^^O^P^. (20) 

i i 

These averages may be read forward in time (corresponding to a weighting of the current pop- 
ulation with expected offspring numbers), and backward in time (corresponding to an averaging 
w.r.t. the distribution of the ancestors). A third interpretation is available if the mutation ma- 
trix is irreducible, which entails that the equilibrium backward process defined by Q is ergodic. 
Then, with probability 1, the equilibrium ancestral average also coincides with the average of 
the observable over a lineage backwards in time. Note that the information so obtained is not 
contained in the population average, which is merely a 'time-slice' average. The ancestral mean 
adds a time component to the averaging procedure, which provides extra information on the 
evolutionary dynamics. In Appendix ^ we shall show that our ancestral averaging coincides 
with the way observables are evaluated in a system of quantum statistical mechanics. 



2.5 Linear response and mutational loss 

We now come to another interpretation of the equilibrium ancestor frequencies introduced in 



Section 2.2. Consider the derivative of the equilibrium mean fitness with respect to the i-th 



fitness value in a general system of parallel mutation and selection (|T]): 



dR 



d 



J2 ZjHjkPk 



ai + R 



d 
dRi 



(21) 



where we made use of the normalization condition ZjPj = aj = 1 . The ancestor frequency 
ai therefore measures the linear response (or sensitivity) of the equilibrium mean fitness to 
changes in the i-th fitness value.0 A similar calculation for the response to changes in the 
mutation rates results in 



dR 

dMij 



{Zi - Zj)pj . 



Using ( |2lD and (|2^), we can express the equilibrium mean fitness as follows: 



R = R + Y^ ZiMijPj 



dR 



(22) 



(23) 



Let us give a variational interpretation for the ancestor mean fitness as well. To this end, we 
define the mutational loss G of the system as the difference between ancestor and population 
mean fitness in equilibrium. Assume now that we change all mutation rates Mij by variations 



in a common factor /i. From (23) and (21) we then find that the mutational loss relates to the 
linear response of the equilibrium mean fitness to changes in the mutation rates as: 



G := - i? = -/i^ 



(24) 



Actually, this relation holds for arbitrary (clonal) mutation-selection systems, in particular also 
if mutation and reproduction are coupled (in which case the mutation rates are replaced by 
mutation probabilities). 

^If mutation is coupled to reproduction, the linear response to variations in the death rate Di is given by —at. 
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Eqs. (H) and (|2|) may also be used to determine the change in mean fitness if H changes 
to H + AH, to linear order in AH. (Small changes in the fitness values, or mutation rates, may 
be due to environmental changes, or changes in the genetic background.) Clearly, H + AH has 
R + AR as largest eigenvalue, with AR ~ Y^.ARi{dR/dRi) + Y.^ - AMij{dR/dMij) to linear 
order in ARi and AMj^. If only fitness values are affected, (|2l| ) yields 



Ai?~^Ai?iai, (25) 

i 

where the a, belong to the original system. If only the mutation rates change by variations in a 
common factor /i, (|2^ ) leads to 

ARc^-^G. (26) 
We will come to further interpretation and discussion of the mutational loss and the response 



relations in Section 5.1 



2.6 Fitness functions and mutation models 

For many of the results and all of our examples, we will restrict our treatment to the case of 
the single-step mutation model as described by Eq. (^). Although most of our results do not 
depend on this particular choice we will, for simplicity, concentrate on this scheme here, and 
only briefiy discuss possible extensions. We will start out with a discussion of fitness functions 
and mutation schemes in this context. Depending on whether the phenotype or the genotype is 
considered the primary quantity for the model, the inherent approximation mainly concerns the 
mutation or the fitness part, respectively. 

If the Xf^ {0 < k < N) are the values of a quantitative trait on which selection acts, fitness 
may be taken as an arbitrary function of it. The essential assumption, in this case, is that 
genotypes with equal trait values have equivalent mutation patterns, with mutation in single 
steps as an additional simplification. This is the original view in which this assumption first 
appeared, with as the electric charge of proteins ( phta and Kimura, 1973 ). The numerous 



papers to follow have been reviewed by piirger (1998| , pOO0| ). 



If, on the other hand, is the number of mutations with respect to the wildtype (i.e. 
Xfc = A; as in the biallelic model), single-step mutation is a natural approximation and directly 
emerges if mutation and reproduction are modeled as independent processes. The essential 
simplification, in this case, consists in the choice of the genotype fitness values, which depend 
only on k. This way, only the average epistatic effect is included in the model, whereas any 
variance among epistatically interacting mutations is disregarded. Fitness functions of this kind, 
although undoubtedly lacking much of the biological complexity, have been used as standard 
landscapes throughout population genetics literature. While the principal reason for this seems 
to lie in the large simplifications due to permutation invariance, they already take full account 



of the limited information on fitness provided by mutation accumulation experiments (e.g. Crow 



and Simmons, 1983| ; but see also the discussion by Phillips et al., 2000| ) . Further, they include 



a broad range of examples with vastly diverging properties, ranging from simple additive fitness 
over quadratic - or otherwise polynomial or exponential - landscapes with smoothly varying 
fitness values (e.g. Charlesworth, 1990|) to truncati on selection (e.g. [Kondrashov, 198^ ) and 



Eigen's sharply peaked landscape ( [Eigen et al., 1989 ) 



For a consistent treatment of our model in the mutation class limit N ^ oc (to be defined 
in the next subsection) , it will be advantageous to think of the fitness values and mutation rates 
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as being determined by the mutational distance per class (or site), Xk '■= X^/N £ [0, 1]: 



R, = Nr, = Nr{x,) , = Nu^ = Nn^{x,) . (27) 



Here, also and are introduced as fitness and total mutation rates per class. They can 
now be thought of as being defined, without loss of generality, by three functions r and on 
the compact interval [0, 1] . We will refer to r as the fitness function, and to and u~ as the 
(deleterious and advantageous) mutation functions of the model. Both and u~ are assumed 
to be continuous and positive, with boundary conditions u~{0) = W^il) = 0, and r to have 
at most finitely many discontinuities, being either left or right continuous at each discontinuity 
in ]0, 1[. This should include all biologically relevant examples. For the biallelic model, the 
mutation functions are simple linear functions of x: 

u+(x) = ^(1 + k)(1 - x) , u~ (x) = fi{l - k)x . (28) 



Note that the classical stepwise mutation model ( phta and Kimura, 1973 ) is not covered by this 



framework, since its genotype space Z is inherently non-compact. 
2.7 Three limiting cases 

Our primary aim in the following sections is to establish simple relations for the equilibrium 
means and variances of mutational distance and fitness that lend themselves to biological inter- 
pretation. Whereas these relations are approximations in the general case, they rest on three 
limiting cases as pillars, for which they hold as exact identities. All three are biologically mean- 
ingful by themselves, two of them are well studied, and we will show that the formulas reduce 
to well-known results there. 

The first case is the limit of vanishing back mutations, defined by Uf7 = in our model. The 
second one is a limit of linearity, in which fitness and mutation rates depend linearly on some 
trait Yfc = Nyi^. = Ny{xjS) with Iq = and Yat = N , such as 

r(x) = ro — ay{x) , u^{x) = /3^(1 — y{x)) , u~{x) = (3~y{x) . (29) 

Note that, if Yj; is proportional to the mutational distance = k, the fitness function is linear 
whereas the mutation functions reproduce the mutation scheme of the biallelic model if 
(5^ = Ii{1±k). This limit can be understood as the limit of vanishing epistasis, in which the 
system is known as the Fujiyama model in the sequence space literature ( cf [K auff man , 1 99^ ) . 

The third case is the limit of an infinite number of mutation classes, — > oo, which we 
will call mutation class limit for short. In the case of the biallelic multilocus model, this limit 
has been used and discussed in a recent publication (jBaake and Wagner, 2001 ). It addresses 



the situation of weak or almost neutral mutations, where the average mutational effect (over 
the mutation classes) is small compared to the mean total mutation rate, [/ S> s. The limit 
further assumes that differences in mutation rate between neighboring (pairs of) classes are 
small compared to the mean rate itself. In this case, genetic change by mutation proceeds in 
many steps of small average effect and the model is a genuine multi-class model in the sense that 
typically a large number of classes are relevant in mutation-selection equilibrium. Note that 
only the average mutational effect must be small; this includes the possibility of single steps 
with much larger effect (such as in truncation selection). 

Technically, the limit — > oo is performed such that the mutational effects s^ and the 
fitness values and mutation rates per class, r and u^, remain constant. If fitness values and 
mutation rates are defined by the three functions r and as described above (27), increasing 
A^ simply leads to finer 'sampling' of the functions. 
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With this kind of scaUng, the means and variances per class of the observables defined 
in Section 2.4 approach weh defined hmits, which then serve as approximations for the original 
model with finite A^. We will denote them by the corresponding lower case letters, i.e. r := R/N, 
vx ■= Vx/N, etc.; an additional subscript will indicate the limit value, e.g. Xqo := limAr^oo^- 
Note that it is, in general, the variance per class of a given quantity that is meaningful in this 
limit, not the variance of the quantity per class (e.g. \ai{X/N)), which tends to zero (cf Section 
4.4). The described limit is the biological analog of the thermodynamic limit in statistical 
physics. We will further discuss this issue for physically interested readers in Appendix 

Let us finally compare the mutation class limit with the more familiar infinite- sites limit, 
which, when applied to the biallelic model, also leads to a stepwise mutation model with an 



infinite number of classes (as found, e.g., in Biirger, 2000 ). Both limits, however, approximate 



an original situation with a large, but finite number of types in quite different ways. In the 
infinite-sites limit, the original model is extrapolated to an infinite one by adding new states 
at the boundaries, where the population distribution is (assumed to be) small. In contrast, 
the present approach arrives at the limit by interpolation of the types of the finite model. 
Mathematically, this leads to a non-compact state space (such as Z) in the infinite-sites limit, 
whereas the state space in the mutation class limit is a compact interval (bounded by the extreme 
types of the original model). To approximate biological observables of the finite model in the 
limit, the approaches use different scaling. In the infinite-sites case, the range of Malthusian 
fitness parameters R usually diverges (depending on how the extrapolation is done), while the 
total ('genomic') mutation rate U is kept constant. In the mutation class limit, both R and 
U diverge with A'^, but the ratio U/R\s kept constant. These differences in scaling result in 
different ranges of validity of the two limits. The mutation class limit assumes C/ ^ s, it is 
accurate if the total mutation rate is large or fitness differences are small, and allows a sizable 
fraction of sites to be mutated {x = X /N may approach a non-zero limit). In this article, we are 
mainly interested in this regime, in particular in Section |^, where we discuss error thresholds. 
Infinite-sites models, on the other hand, typically assume U <^ s. Then back mutations can 
be neglected, and the bulk of the population is concentrated on just a few classes with a finite 
number of mutations. 



3 Results for observable means and variances 

In this section, we want to give a short summary of our main findings for the single-step mutation 
model. Derivations and a more extended discussion are postponed to Sections ^ and ^. 

A key result of this article is the following estimate of the equilibrium mean fitness, which 
states a maximum principle and holds as an exact identity in the three limiting cases described 
in the preceding section: 

f ~ foo = sup (r(x) — g{x)) ■ (30) 

xG[0,l] 

Here, the function g is defined as twice the difference between the arithmetic and geometric 
mean of the mutation functions: 

g{x) = u^{x) + u~{x) — 2y^u~^{x)u^{x) . (31) 

For reasons that will become clear in Section |5.1| , we will call it mutational loss function. For 
the biallelic model, it reads explicitly: 

g{x) = h(i + k-2kx- 2y/ {1 - k'^)x{1 - x)) . (32) 
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In general, foo describes the equilibrium mean fitness f to leading order in and to next to 
leading order in u~ . The approximation is indeed rather accurate already for moderately large 
N and/or weak back mutation rates, cf Section |5.3| and the examples in Section ^. 

The maximum principle ( |30| ) is closely linked to the ancestor distribution. In particular, if 
the maximum is attained at a unique value, this is precisely the ancestor mean Xoo- 

foo = r{xoo) - g{xoo) = -Too - 5(^00) , (33) 

where the relation r(a;oo) = ^00 can be proved for all three limiting cases. A corresponding 
relation for the population mean Xoo, 

foo = r{xoo) , (34) 

holds in the mutation class limit and the linear case, if this equation has a unique solution (e.g. 
for strictly monotonic r). 

The variances per site of fitness and of distance from wildtype are then given by 

Vr,oo = -r'{xoo){u^ixoo) - u-{xoo)) and = ,,'"'°°,,2 , (35) 

ir'{xoo)) 

provided r is differentiable, in which case — ^'(xoo) is the population mean of the mutational 
effects. For the biallelic model this is explicitly: 

Vr,oo = -r'{xoo)fJ'{l + K-2xoo) and vx,oo = - ^ (36) 

If r has a jump discontinuity at xjump from to r~ and we have ^ Too — ^ 5 then x^o — ^jump 
and vr^oo diverges. In this case, Vr^oo = limTV^oo Vr/^'^ is finite (cf the example of truncation 



selection in Fig. 13 and the one in Fig. |ll|): 

K,oo = (^+-^oo)(foo-r-). (37) 

The results presented here lead to simple graphical constructions of the means as shown in 
Fig. ^ This allows for an intuitive overview over the dependence of these quantities on (the 
shape of) the fitness function and mutation rates, without the need for explicit calculations. 

4 Derivations 

We now come to the proofs and some first interpretation of the results presented in the previous 
section. Our starting point is the mutation-selection equilibrium of the single-step model (^) 
for finite A^, i.e. the eigenvalue equation 

[r(xfc) - u+(xfc) - u~'{xk)]pk + u^{xk-i)pk-i + u'ixk+i)pk+i = fpk ■ (38) 
For most of our calculations, we will use the equivalent equation for the ancestor distribution, 

dm, 



[r(xfc) - u+(xfc) - u ixk)]^/ak + 



^yu+{xk-l)u {xk)^/ak-l + ^/u+{xk)u (xfc+i)^afe+i = r^o^ , (39) 

which is the eigenvalue equation for the largest eigenvalue of the symmetric matrix H. For the 
latter, Rayleigh's principle is applicable, which is a general maximum principle involving the full 



subsections we will show, for each of the three limiting cases (cf Section p.TD separately, how 



(A^ + l)-dimensional space: r = svcpyY^k iVk^keyi/ "^kVk^ with non-zero y. In the following 



it boils down to the simple scalar maximum principle (30) and the relation (|33|), and give a 



biological interpretation. We will then come to the derivation of the other identities. 
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4.1 Unidirectional mutation 

We start with the hmit of unidirectional mutation, since exclusion of back mutations leads to a 
considerably simpler situation, and we can show how our findings connect to well-known results. 
To be specific, we assume = and > for /c < A^. All results then follow fairly directly 
from the equilibrium condition (p^). 

Owing to = 0, the equilibrium distribution p in general depends on initial conditions. 
But tt^ > implies that for any such p, there exists a particular label k, < k < N, which 
divides all classes of genotypes into two parts according to 

Pk = 0, k < k, Pk > 0, k > k . (40) 

Equivalently, we obtain for the corresponding left eigenvector z: 

Zk = 0, k > k, Zk > 0, k < k . (41) 

Since Uk = Pk^ki this shows that the only non-zero element of the ancestral distribution is = 1, 
and that k is the equilibrium ancestor mean X of the mutational distance from the reference 
class Xq. In line with this, the mutational distance of every line of ancestors in equilibrium 
dynamics converges to k (with probability 1). For the classes with non- vanishing frequency, the 
fitness and total mutation rate are thus related according to 

f — f = r(x) — f = n^(x), r{xk) — f < n^(xfc), k > k , (42) 
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the first part of which corresponds to (|33D . 

Although the equilibrium distribution is not unique, (42) implies that the one with maximal 



mean fitness (which is the only stable one and is automatically obtained in the limit of vanishing 
back mutations, 0, or by starting with a population with pQ{t = 0) > 0) is characterized 

by 

f = r(x) — u~^{x) = max (r(xfc) — M^(xfc)) (43) 



for arbitrary choices of r{xi.) and u~^{xk). Obviously, (p3|) is the discrete version of the maximum 
principle given in Eq. (|30|). 

If the sequence r(rEfc) or the sequence u^(xfc) is monotonically decreasing (as in the biallelic 
model), k is also the fittest class present in the equilibrium population: 

f = r{x) = max {r(xfc) | Pfc 7^ O} . (44) 



If additionally k coincides with the class of maximal fitness, i.e. f = rmax, then ( [42D is a special 
case of Haldane's principle, which relates the mutation load / to the deleterious mutation rate 
of the fittest class ( [Kimura and Maruyama, 1966 ; Biirger, 1998): 



I = rma.^ - r = u'^ (x) . (45) 



In derivations of (variants of) Eq. (|45|), it is often tacitly assumed that the equilibrium frequency 
of the fittest class is non-zero. This, however, is in general not the case and must be made 
explicit here since we are also interested in the change of the equilibrium distribution with 
varying mutation rates. This can lead to a shift in k and hence in f. 



4.2 The linear case 

If fitness values and mutation rates depend linearly on some trait Y, as described in (|29|), the 
maximum principle holds as an exact identity. This may be derived from (p^) by a short direct 



calculation, which we present in Appendix B.l 



For an interpretation of this result, first consider a trait proportional to the mutational 
distance X from the reference class, in which case the system coincides with the Fujiyama 
model. Since this is a model without epistasis, the means and variances are easily obtained 



( O'Brien, 1985 ; Baake and Wagner, 2001 ). In particular, they are independent of the number 
of classes. What is more, our derivation shows that they only rely on a linear dependence of 
fitness and mutation functions on some trait, as well as the boundary conditions for the mutation 
functions. This means that they remain unchanged if mutation classes are permuted, or even 
subjoined or removed. 



4.3 Mutation class limit 

Since the proof of the maximum principle (^) and the relation ( |3^ ) in the limit ^ cxd is 
somewhat technical we will just give a sketch here and defer the details to Appendix |B.2|. The 
main idea is to look at the system locally, i.e. at some interval of mutation classes in ( p8|) and 
(p^. This will provide us with upper and lower bounds for the mean fitness of a system with 
finite N (denoted by f^). In the limit 00, they can then be shown to converge to the same 
value foo = limAT^oo ''at- 

For a lower bound, let us consider submatrices of the evolution matrix H that, for any class 
iYfc, consist of the rows (and columns) corresponding to Xk-m through Xk+n- Each of them 
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describes the evolution process on a certain interval of mutation classes at whose boundaries 
there is mutational flow out, but none in. Thus, each largest eigenvalue, rk,m,m corresponding 
to the local growth rate, is a lower bound for f^. In order to estimate fk,m,n-, it is advantageous 
to use the formulation in ancestor form - with the same local growth rates as largest eigenvalues 
of the corresponding symmetric submatrices of H. Here, lower bounds can be found due to 
Rayleigh's principle, and follow from evaluating the corresponding quadratic form for the vector 
(1,1,..., 1)^: 



1 



n + m + 1 



k+n 

E 

l=k—m 



(46) 



Uc U 



£+1- 



where 5^^^ = M+ + - 

For an upper bound, consider a local maximum of the ancestor distribution, i.e. a k'^ such 
that a;j+ > flfc+iti (with the convention aN+i = o-i = such a maximum always exists). 
Evaluating (|39|) for this then yields the inequality 



< sup [r^ 
k 



9N,k) ■ 



(47) 



Let now = r{xk) and = u^{xf,) be given by continuous functions as described in 
Eq. (p7|), and analogously Qj^ j. = gN{xk). (The more general case with a finite number of steps 
in r is treated in Appendix |B.2| .) For an increasing number of mutation classes, fitness values 
and mutation rates of neighboring classes will then become more and more similar on the scale 
of the total range of values. More generally, we can use that Xk — Xk±i = for any 

finite i as N ^ co. Defining, for each x G [0, 1], an appropriate sequence (/cat) = {kj\f{x)), such 



that 



we therefore obtain r{xkf^±i) — gN{xkM±i) ~^ ^(^) ~ 9ix), with g{x) as defined 



m Eq. (P). Evaluating 

T~kN,m,n for increasing submatrix dimension n + m — > cx3 in that limit, 

we have lim„+m^oo limAf^oo '^fcjv.m.n = '''{x) — g{x) for each x. Combining this with the upper 



bound (|47|), in which supfc( 
to the uniform convergence qn 



gN,k) < SUp^G[0,l](K^ 

-> g (see Appendix 



- 9n{x)) 
gives 



suPx&[o,i]ir{x) - g{x)) due 



sup (r(x) - g{x)) < 



< 



sup 

xG[0,l] 



{r{x)-g{x)) 



(48) 



which implies the maximum principle (30). As shown at the end of Appendix B^, the ancestral 
distribution is sharply peaked around those x at which r{x) — g{x) is maximal. Thus, whenever 
the supremum is unique (which is the generic case), Eq. (|33[) follows. 



4.4 Mean mutational distance and the variances 

In this subsection, we derive and discuss the results for the mean mutational distance and the 
variances, which hold in the linear case and for iV — > oo. 

If fitness is linear in an arbitrary trait yk = y{xk)i the relation f = r{y) is immediate. For the 
variance formulas, we must additionally assume that fitness is linear in the mutational distance, 
r(x) = Tmax — ctx, or, equivalently, that all mutational effects are equal. Thus, the covariances 
in the general formula (|T^ vanish, and Vj^ = a (n"*" — u~). Due to linearity, this also determines 
the variance in mutational distance as Vj^ = (n"*" — u~)/a. These relations do not require that 
u^{x) are linear in x; they reduce to (p5|) if this is the case. 

In the mutation class limit, let us first assume r to be continuously differentiable on [0, 1] 
with derivative r' . Expressing vr^oo as the limit variance for increasing system size N, and using 
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([l^ for the variance of each finite system, vr^n, we obtain 



N 

VR,^ = lirn^ Yl C N-l'^^ < - ^^^7^31^ ^k)pk = -r' iu+-u-)^ . (49) 
fc=0 

Here, we made use of the fact that the mutational effects converge to the corresponding values 
of — r', i.e. the negative slope of the fitness function. 



Since r' is bounded, (49) in particular shows that f_R,oo is finite, and hence 



K,oo = lim 

N^oc 



N N 
k=0 k=0 



lim N'^VR^N = . (50) 



For increasing A^, the distribution of fitness values per class therefore concentrates around r. In 
the limit, if r is invertible at foo, this fixes the mean mutational distance at Xoo = r~^{foo), cf 



(34), which approximates the mean distance xjsf = X^/N of a finite system to leading order in 



-1 



N 

With this, we have vr oo — — r'{xoo){u+{xoo)-u (xoo)), cf (H), which approximates vr^n = 
Vr^n/N. Note that the leading order term w.r.t. A^~^ is proportional to — r'(xoo)) which is the 
population inGan of the mutcitional effects in the limit: s~^j^ — > s~^qq — 5 oo — —r'{xoo). (The 
local curvature of r only gives rise to higher order corrections.) Obviously, the leading order 
depends only on the effective deleterious mutation rate, u~^{xoo) — u~{xoo), if this does not 
vanish. Otherwise, the dominant term is of higher order in A^~^. 

The variance in x can be obtained via the linesir cipproxinicition r(x) — t(^Xqq^-\-t^(^Xqq^(^x — Xoo) 
as Vx.oo — "^R.oo /{r'{xoo)Y, cf (|35|). In contrast to f^, Vx decreases with increasing mutational 
effects at x. Interestingly, \/v^Jv^ can serve as an estimate for the mean mutational effect (at 
least in our simple setup) - a quantity which is difficult to determine experimentally. For our 



numerical examples in Sections 5.3 and ^, this works reasonably well (not shown). 

Comparing the results with those for the linear case above, we see that, given f, the infinite 
mutation class limit can be interpreted as a local linear approximation. This does not mean, 
however, that nonlinearities (i.e. epistasis) are ignored. They enter indirectly through the mean 
fitness as determined by the maximum principle. 

For fitness functions with kinks, the derivation is analogous, as long as the left- and right- 
sided limits of r', and thus the mutational effects in the limit — > oo, remain bounded. If r' 
diverges at Xqo, or if there is even a jump in the fitness function, diverges according to the 
above relation. In the latter case, Vr^oo is finite and determined by the fraction of the population 
below and above the jump, which yields 



5 Interlude: Applications and Discussion 
5.1 Mutational loss 

A central role in this article is played by the mutational loss G, which was defined in Eq. ([2^ ) 
as the difference between the ancestor and population mean fitness in equilibrium. Let us now 



add some further interpretation to this quantity. Recapitulating relations (22)-(24), we obtain 
for g := G/N in the framework of the general mutation-selection model (||): 
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It is instructive to compa/rG q with, the muta-tion loctd I — ^max — 

r. Both quantities describe 

the effect of mutation on the equihbrium mean fitness. But whereas the mutation load compares 
the biological system with a fictitious system free of mutations, the loss is essentially a response 
quantity: In analogy with (^), we have 6f ~ —{6fj,/fi)g. Since mutation rates are usually not 
switched on or off in nature, but may be subject to gradual change, the mutational loss seems to 
be the quantity of more direct relevance for questions connected with the evolution of mutation 
rates. 

From the above, we see that the loss can be understood as the linear component of the load. 
In particular, loss and load will coincide if the latter is linear in This holds for unidirectional 
mutation as long as the wildtype has non-vanishing equilibrium frequency (and, more generally, 
below a wildtype threshold, see Section 6.2.2 ), where rmax = ^- In general, however, also non- 
linear terms in /i will contribute to the load and we find I > g. 

The genetic load concept has often been criticized, since the reference genotype (usually 
the one with maximum fitness) is often extremely unlikely to be found in the population at 
all. This argument is made precise by lEwens (1979| , Ch. 9.2) and |Gillespie (199l| , Ch. 6.2) for 
the substitution and the segregation load in finite populations. An analogous point may be 
made against the mutation load, even in infinite populations: The equilibrium frequency of the 
fittest class is often close to zero (or may even vanish for unidirectional mutation). Therefore, 
measurements of rmax in real populations are difficult, if not impossible, and the evolutionary 
significance of the reference type seems questionable. 

This problem is circumvented in the definition of the mutational loss. As a response quantity, 
g is well-defined as long as it is meaningful to think of a system as in equilibrium. Measurements 
of g could make use of marker techniques in (bacterial or viral) clones in order to determine 
clone sizes (and thus z) and ancestor frequencies, or determine directly the response of f to 
changes in mutation rates, e.g. by comparing strains with different mutation repair efficiencies. 

Up to this point, we have entirely concentrated on the mutational loss as a response quantity. 
There is, however, a second line of interpretation, which clarifies the role of g in the equilibrium 
dynamics and also sheds some light on the maximum principle. If an individual mutates from 
j to i, its offspring expectation changes by — Zj, where the sign determines whether a loss 
(-|-) or gain (— ) is implied. Since the mutational flow from j to i in equilibrium is M^jP^ the 
entire system loses offspring at rate J2iji^j ~ ^i) ^'hjPj^ which is the same as G (compare with 
Eq. (ID or (13)). 

The mutational loss does not include any information about the destination of the 'lost' 
offspring. This, however, may easily be found by recalling that, asymptotically, every ancestor of 
type i leaves ZiPj descendants of type j in the equilibrium population. Further, pi{zi — l) = ai—pi 
is the excess offspring produced by an z-individual. We thus come to a picture of a constant 
flow of mutants from the ancestor to the equilibrium population. 

Let us now turn to the mutational loss function g{x). Recall that, in the derivation of the 
maximum principle in the mutation class limit, we obtained r{x) — g{x) as the leading eigenvalue 
of a local open subsystem around x; if fao is the death rate due to population regulation in the 
entire system, r{x) — foo — g{x) is the net growth rate of the subsystem at x. Hence, g{x) must 
describe the rate of mutational loss due to the flow out of the local system. This can be made 
more precise within the framework of large-deviation theory, which will be presented in a future 
publication. If r{x) — g{x) has a unique maximum, in which case the ancestor distribution has 
a single peak, the maximum principle ( [30| ) along with f and f — > r{xoo) as N ^ oo 

implies that the mutational loss g = G/N converges to g(xoo)- Thus, g{x) can be taken as an 
approximation to the actual mutational loss g in this case. 

Let us finally add a remark concerning the influence of epistasis on the mutational loss (in the 
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sense of a response quantity) in the single-step model. Following the suggestion of Phillips et al 
(2000 ), we speak of negative (positive) epistasis w.r.t. some class k \i Rk+i — Rk < (>) Rk — Rk-i- 
This entails synergistic [antagonistic) interaction of deleterious mutations. This way, negative 
and positive epistasis are connected to concavity and convexity of the fitness function, and thus 
to its second derivative (if well-defined) being negative, respectively positive. 

Let us now keep the mutation rates fixed and compare fitness functions with different degrees 
of epistasis. Let (7 be a decreasing loss function, and r and rg two decreasing fitness functions 
which are either convex or concave, and only differ in an open subinterval of [0, 1] that includes 
X (the ancestral mean trait under r). Assume that rg — r is a concave function in our subinterval. 
Then rg describes more negative, or less positive, epistasis than r. Under the above assumption, 
rs — r has a unique maximum, whose position we denote by xq. As is most easily seen from 
the graphical representation of the maximum principle (Fig. ^), one then finds x^ > x whenever 
xq > X (and vice versa), where Xg is the ancestral mean trait under the modified fitness function. 
Since g{x) is decreasing, it follows that g{xs) < g{x) if xq > x. If x is small (as may be considered 
typical of realistic examples), increased negative epistasis will reduce the loss. The opposite may 
be said of decreased negative or increased positive epistasis, in line with the fact that the loss is 
maximal for the sharply-peaked landscape, which displays extreme positive epistasis. 

5.2 Haldane's principle and evolution of mutational effects 

As we have seen in the discussion of the unidirectional case, the maximum principle reduces 
to a well-known form of the Haldane-Muller principle in that limit. Using the concept of the 
ancestor distribution, we will now re-analyze this principle in the broader context of models 
with back mutations. We will also discuss consequences for the evolution of mutational effects 
and mutational robustness. 

For models without back mutations to the fittest genotype with non-zero equilibrium fre- 
quency, Haldane's principle says that the difference in fitness between this type, i, and the 
population mean is equal to the total mutation rate for i. For our general model this reads 
L = i?max — R'i + yZn-^^'i where M--- is the mutation rate from the fittest type (or class) i to 
some other type (or class) j ^ i. Note in particular that the load is independent of the mutant 
fitness values if the wildtype itself has non-zero frequency in equilibrium, i.e. i = 0. We will 
assume Rq = Rma.x for simplicity in this section. 

If back mutations to the fittest class are present, but mutation rates, denoted by n, are small 
compared to the fitness advantage, u <C s, the relation for the load is modified by a correction 
term of order u'^/s ( [Biirger and Hofbauer, 1994| ). In the following, we will reproduce this result 
in our setting by deriving an explicit expression of the correction term for the single-step model. 
We will also show that this leading order contribution of the back mutations is exactly contained 
in the estimate of f as derived from the maximum principle. 

Let us assume, for notational simplicity, that the wildtype is also the fittest type present in 
the equilibrium population, and remains so if back mutations are switched off. Suppose that 
the back mutation rates u'^ are small compared to the fitness effects, but not necessarily the 
deleterious mutation rates u'^. We then obtain, to linear order in , 



+ df 



Pi 

Po 



"1 > 



(52) 



where we have used Eq. ( p2[ ) for the derivative of r with respect to the mutation rates, and 
zq = l/po, zi = for = 0. Calculating pi/po from the equilibrium condition for the 
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mutation-selection equation, we find 



+ +0(KF)- (53) 



This is in accordance with the result of Biirger and Hofbauer (1994 ) if &lso Uq,u^<^ Sq 



On the other hand, starting with a linear interpolation of the fitness and mutation functions 
of the form r{x) = ro + Nx{ri — ro), u'^{x) = Uq +Nx{u'l — Uq), and u~{x) = Nxu'^ for < x < 
we find the load by using f from the maximum principle. To linear order in m~, a lengthy 
but elementary calculation yields that r{x) —g{x) is maximized at Nx = UqU'^ /{sq —Uq+Ui)"^^ 
and we again obtain Eq. (^) for the load. We can therefore conclude that the maximum 
principle, when applied to finite A^, still gives results that are correct to linear order in the back 



mutation rates (cf Section 5.3). 

In the preceding paragraphs, back mutations have merely played the role of a small perturba- 
tion of the system with unidirectional mutation. Our main interest in this article, however, lies 
in the case of sufficiently large mutation rates - or sufficiently small fitness effects of mutations 
(as in a nearly neutral landscape) such that the equilibrium distribution is no longer dominated 
by one or a few wildtype states, but is dispersed over many classes. This is exactly the situation 
in which one would assume back mutations to become important, with effects beyond a second 
order correction term. At the same time, this is the domain of validity of the mutation class 
limit, in which the maximum principle is also exact. We then obtain 

/ = W-r(£)+5(£) [<5(0) =^x+(0)] (54) 

as an estimate for the mutation load. Clearly, the load is no longer independent of the fitness 
function as soon as the ancestral mean fitness f = r[x) differs from the wildtype fitness. Note, 
however, that the only quantity that matters is the deviation of the ancestor mean fitness from 
the wildtype fitness. 

It is instructive to compare the load for different fitness functions. Let and r be fitness 
functions with r(0) = rs(0) = rmaxi and rs(x) > r{x) for all x E [0,1]. By the maximum 
principle, the load with rg cannot be larger than with r. If rs(x) > r(x) at x = x, the ancestral 
genotype under r, the load with rg is strictly smaller than with r. In this sense, higher mutant 
fitness tends to decrease the mutation load (and vice versa). 

Let us now extend these thoughts to the evolution of mutational effects. To this end, we 
consider a general mutation-selection model (i.e. not restricted to permutation invariant fitness 
or single-step). Assume there is an additional modifier locus, which is tightly linked to the other 
loci and changes the fitness of one or several of the original types or classes. (In the biallelic 
model, this may, for example, happen through epistatic interactions outside our permutation 
invariant fitness scheme.) 

Let now a modifier be introduced into the equilibrium population at low frequency at time 
t = (by mutation or migration), and consider its fate for t — > oo. If there is no further 
mutation at the modifier locus, the modifier will asymptotically fix (or get lost) in terms of 
relative frequencies, p{t) = y(t)/{J2i Uii't)), if the modified system has a larger (smaller) leading 
eigenvalue than the original one, in which case we write 6f > {5f < 0). If 5f = 0, the modifier 
will equilibrate at an intermediate frequency, the exact value of which depends on the initial 
conditions. 

The above argument is analogous to the clonal competition mechanism as described for 



mutation rate modifiers in asexual populations (for review, see Sniegowski et ai, 2000 ). It 



requires slight modification if mutation at the original loci is unidirectional. Here, the fate of 
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the modifier also depends on the genetic background it is introduced into at i = 0. If the fitness 
modifications are so small that the fittest type present remains the same in equilibria with and 
without modifier, the modifier will always get lost if it does not already occur in individuals of 
that type at t = 0. This follows since all other types asymptotically expect no offspring. 

Note that the competition mechanism just described works within the population, the sepa- 
ration of genotypes with and without modifier being due to tight linkage of the modifier to the 
primary loci. In particular, no group selection is implied. 

What consequences, now, does this have for the possibility of mutational effects to evolve? 
Again, the answer involves the ancestor distribution. We have seen in (|2^) that changing the 
fitness values rj to + 5i will change the equilibrium mean fitness by 



5r ~ ^ 5iai , (55) 



to first order in the 5i. From this, we now obtain the following intuitive picture: In order for a 
modification to prevail in an equilibrium population, it has to invade the ancestors; otherwise, 
it will be 'washed away'. 

Let us discuss this in some more detail. According to our above discussion, the fate of the 
modifier is entirely determined by 6f if we have back mutations. Now, the right hand side 
of Eq. ( |55D may be interpreted as the selection coefficient of the modifier with respect to the 
ancestor distribution - assuming that the modifier is statistically independent of the other loci. 
In order to understand why this quantity governs the leading order of 6f, consider infinitesimal 
small fitness changes 6i, in which case Eq. (^) becomes exact. Here, mutation will indeed drive 
the modifier distribution towards statistical independence in an initial period of time. In order 
to eventually spread to fixation, the modifier now has to compete successfully against those 
types whose descendents make up the equilibrium population at an even later time. These, 
however, follow the ancestor distribution. In this sense, 5f may be understood as measuring the 
modifier's growth within the ancestor population. In the same vein, the vector of the ancestor 
frequencies can be seen as the gradient of the mean fitness, pointing into the direction of the 
indirect (i.e. second order) selection pressure exerted on the fitness values r^. 

This long-term picture is in sharp contrast to the initial growth of the modifier in the popu- 
lation, which is determined by its selection coefficient with respect to the equilibrium population 
and of course depends on the distribution of the modifier over the types at t = 0. If 6f is positive 
(negative), the modifier will asymptotically fix (vanish) even if its initial selection coefficient is 
negative (positive). Note, however, that this process may be very slow if 5f is small. 

If there is no back mutation to the fittest class (or type) i present, this is the absorbing 
state of the backward process, in which all lineages end, and the ancestor distribution is entirely 
concentrated there (a^ = 1, aj = 0, j ^ i). So Eq. (|55[) leads back to the prediction of Haldane's 
principle that the mean fitness is independent of the mutant fitness values in this case. In order 
to 'invade the ancestors', a modifier must be introduced into the fittest type in the first place, 
and increase its fitness. 

Assume now that the wildtype fitness is kept fixed but mutational effects at the wildtype are 
modified by variations of the mutant fitness values. Such modifiers are canalizing (or modifiers 
for mutational robustness) if they increase the mutant fitnesses, and decanalizing (modifiers 



for antirobustness) if they decrease the mutant fitnesses ( Wagner et al, 1997 , cf). It is now 
clear from Eq. ( [55| ) that only an increase of mutant fitness values may lead to an evolutionary 
advantage. Independently of the fitness landscape or of mutation patterns, we thus never find a 
potential for the evolution of antirobustness in mutation-selection models; however, mutational 
robustness may, indeed, evolve. Here, modifiers increasing the fitness of mutant classes with large 
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FIG. 5. Comparison of population frequencies pk (near k = 40) and ancestor frequencies (near 
k — 15) for the biallelic model with fi — 0.3657 (where 7 is the loss in reproduction rate due to a single 
mutation as in Fig. ^), k = i, and N — 100. The right axis refers to the fitness functions used: additive 
fitness r{x)/'y = 1 — x (solid lines), and a modified version (dashed lines) that is favored with respect 
to the additive one. The modified fitness is increased in regions of high ancestor frequencies. In this 
particular example, it is slightly decreased at the wildtype and unchanged in other regions of vanishing 
ancestor frequencies, but note that the success of a modification is independent of the fitness values there. 



ancestor frequencies will be under particularly large (positive) selection pressure. If modifiers 
have deleterious side-effects, these may even be the only ones that persist and go to fixation.^ 

Let us, for further analysis, consider two limiting cases of the mutation scheme now. If 
mutation is unidirectional, neither modifications for robustness nor for antirobustness will change 
the mean fitness (at least under the usual assumption that the wildtype is present in the original 
equilibrium). We may conclude that there is no selection pressure on the mutant fitness values 
at all in this simple setting, and hence no potential for these to evolve either.^ On the other 
hand, if the mutation matrix is symmetric, Mij = Mji, the ancestor frequencies are proportional 
to the square of the population frequencies, Oj ~ . Thus, the landscape is evolvable exactly in 
those regions in which the equilibrium frequency of the population distribution is high. 

Note that we may come to different results here depending on whether genotype classes or 
single genotypes are the relevant entities. If mutation between genotypes is symmetric (as in our 
biallelic model with k = 0), modifiers of single genotypes will be particularly important if the 
corresponding equilibrium frequency is high. For modifiers of whole genotype classes, however, 
the asymmetric mutation scheme with respect to the classes is relevant, and the maximum of the 
ancestor distribution will in general deviate from the maximum of the population distribution. 

In order to see what happens between these limiting cases, let us restrict our discussion again 
to the single-step mutation model. Here, the ancestor distribution becomes sharply concentrated 
around x with an increasing number of mutation classes (cf Appendix B.2). Similar to the case 
of unidirectional mutation, only a very minor part of the fitness function will thus experience 

^Note, however, that no predictions are made here concerning invadability of modifier mutations, or fixation 
probabilities, if random drift becomes a weighty factor. 

®For very large, but finite populations (where MuUer's ratchet does not operate but there is drift among classes 
of equal fitness) the fixation probability of clones wi th and without the modifi er is ultimately determined only 
by the initial sizes of the respective wildtype classes (Gabriel and Burger, 2000). Any modifier which enters the 
wildtype class at low frequency will therefore get lost from this class and, consequently, from the population with 
high probability. 
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appreciable selection pressure. Note that this part need neither extend to the types which 
contain the bulk of the equilibrium distribution (concentrated around f < f), nor the largest 
fitness values at rmax > f"- If robustness modifiers have deleterious side effects, only those which 
lead to buffering in the ancestor region will prevail at all. Therefore, if robustness evolves by the 
mechanism described, the strongly differential selection pressure might lead to the emergence 
of synergistic epistasis at the same time. This is illustrated in Fig. where modification of the 
fitness function leads to a flattening near its 'summit' at a; < x relative to the 'slope' at x > x. 
The example also shows that an increase in fitness around f may compensate for a deleterious 
side-effect of the modifier mutation which decreases the wildtype fitness. 



5.3 Accuracy of the approximation 

In this subsection we wish to illustrate the accuracy of the analytical expressions for means and 
variances given in Section |3[ To pay respect to the invariance of the equilibrium distributions 
under scaling of both reproduction and mutation rates with the same factor, we introduce 7 
as an overall constant for the reproduction rates. It should be chosen to represent roughly 
the average effect of a single mutation on the reproduction rate in a mutant genotype (with 
the maximum number of mutations considered) as compared to the wildtype. This does not 
exclude the possibility that effects of single mutations may be quite large. In the figures, both 
reproduction and mutation rates are given in units of this constant, i.e. as r/7, respectively /i/7. 

Fig. ^ displays an example of a biallelic model that deviates from all three exact limiting 
cases described in Section p.7| , and, for comparison, three modifications that are closer to one 
of the exact limits each. All numerical values, also in the rest of this article and in Figs. ^ and 
are virtually exact and, if not noted otherwise, obtained by the power method ( Wilkinson j 



1965| , Ch. 9, also known as von Mises iteration) with the evolution matrix H. For continuous 
fitness functions, the approximate expressions for the observable means agree with the exact 
ones up to corrections of order N^'^ (as indicated by numerical comparison, not shown) or of 
order (u^)^ (cf Section |5.2| ). For fitness functions with jumps, the error seems to be at most of 
order A^-^/^ f^^f ^ig. |l|); for a jump at X = such as in the sharply peaked landscape, however, 
the corrections to f appear to be still of order N~'^ for the biallelic model (cf Fig. 0). 

Further examples, exhibiting more conspicuous features, are shown in Section ^. For most 
of them, one will also find good agreement of numerical and analytical values for the means 
for sequences of length N = 100; for the variances, however, one sometimes needs longer ones, 
like N = 1000. In the biallelic model, we generally find stronger deviations for higher mutation 
rates, as in this regime back mutations become more and more important, whereas for small 
mutation rates, deviations are of linear order in /i. 



6 More applications: threshold phenomena 

In this section, we will take a closer look at how the equilibrium behavior of a mutation-selection 
system changes if the mutation rates are allowed to vary relative to the corresponding mutational 
effects. In order to keep the overall shapes of the fitness and mutation functions constant, we 
vary all mutation rates by a common scalar factor /i > 0. Concentrating on the single-step 
mutation model in this section, we choose /i as the mean mutation rate over all classes, 

N 

/i = (2iV)-i^(n+ + n^) (56) 

fc=0 
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FIG. 6. T he t op row refers to a biallelic model that deviates from aU three exact hmiting cases described 
in Section 2/7 in having a strongly non-additive fitness function r/7 (left, solid line), symmetric site 
mutation (k = 0) , and small sequence length {N = 20) . The mean values of the observables (middle) and 
corresponding variances (right) are shown as a function of the mutation rate fj./j, both for the model 
itself (symbols) and according to the expressions given in Section |^ (lines, sometimes hidden by symbols). 
Even here, we find reasonable agreement. Deviations, however, are visible for larger mutation rates. 
As can be seen from the last two rows, going towards any of the three exact limits, i.e. increasing the 
number of mutation classes (left, N = 100), going to more asymmetric mutation (middle, k — 0.8), or 
using a different fitness function with less curvature (right, r/7: top left, dashed line), we find that these 
deviations vanish quickly. In the case of increasingly asymmetric mutation, however, this is not true 
for the variances, since the approximation becomes only exact here in either of the other two limits (cf 
Section O). 
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(recall that Uq = = 0). This is consistent with the definition of /i as the mean point mutation 
rate for the biallelic model, cf Eq. ( p8| ) and Fig. ||. By slight abuse of notation, we define the 
shape of the mutational loss function as g{l,x) = fi^^g{x) (which does not depend on ^), and 
introduce /i as a variable parameter via g{iJ,,x) = fj,g{l,x). 



6.1 Mutation thresholds 

Consider a population in mutation-selection balance. Usually, if mutation rates change slightly, 
the population will move on to a new equilibrium with the observables, like means and variances 
of traits and fitness, close to the old ones. At certain critical mutation rates, however, threshold 
phenomena may occur, associated with much larger effects on traits or fitness. The prototype 
of this kind of behavior is the so-called error threshold, first observed in a model of prebiotic 
evolution many years a go ([Eigen, 1971 ) and discussed in numerous variants ever since (for review. 



see Eigen et al, 1989 ; Baake and Gabriel, 200[)|) . 



In the following, we will discuss and classify 'error threshold like' behavior in our model 
class. We shall, however, avoid the term error threshold as the collective name for all threshold 
effects that may be observed, but rather, and more generally, speak of mutation thresholds. 
This is because the definition of the error threshold is closely linked to the model in which it 
had been observed originally, namely the quasispecies model with the sharply peaked fitness 
landscape. While many effects of the original error threshold will turn out to generalize easily to 
the much larger class of models considered here, the criterion of the loss of the wildtype, which 
has frequently been taken as the defining property of the error threshold, seems to be applicable 
only in special cases. 

We want to be as general as possible as far as the fitness model and mutation schemes 
are concerned, but specific about the responsible evolutionary forces. Error thresholds have 
also been described as driven by the joint action of mutation and segregation ( Higgs, 199^ ) or 
recombination ( |Boerlijst et al., 1996| ). We will not consider these phenomena. 

Let us now define the notion mutation threshold. Ideally, a characterization should give a 
precise mathematical definition in the modeling framework which, at the same time, captures 
biologically significant behavior. As may be seen from the varying and sometimes incompatible 
definitions that have previously been suggested for the error threshold (see, e.g., the discussion 
Baake and Gabriel, 2000| ), this can be a complex problem. Let us therefore start with a verbal 



m 



description: 

A mutation threshold for a particular trait or fitness is the pronounced change of 
the equilibrium distribution of the trait or fitness values within a narrow range of 
mutation rates. Here, the threshold phenomenon is purely due to the interplay of 
mutation and selection. 

Note that we only consider effects on distributions, not on absolute numbers. This demarcates 
mutation thresholds from mutational meltdown effects (cf pabriel et al., 1993| ) . 

In order to come to a stringent mathematical definition, a two-fold limit must be considered 
for any general mutation-selection model (||). These are the infinite population limit, which we 
assumed right from the beginning, and the limit of an infinite number of mutation classes. 

Application of the infinite population limit is a direct consequence of the last condition in 
the verbal definition above. As mutation thresholds result from mutation and selection alone, 
they must persist in the absence of genetic drift. Hence, unlike drift effects (like Muller's 
ratchet), these phenomena can not be avoided by increasing population size. For the purposes 
of analysis and classification, therefore, deterministic models provide the right framework. Of 
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course, aspects of thresholds should also persist in (large) finite populations, if the phenomena 
are biologically relevant. For some models this has been confirmed in numerical studies (Nowak 



[and Schuster, 198£ ; Bonhoeffer and Stadler, 1993| ): While certain properties of the threshold 



(such as the critical mutation rate) may be altered by finite population size, the threshold effect 
as such is not eliminated by drift. 

The infinite mutation class limit, on the other hand, is needed to give the vague notion 
of a 'pronounced change' a more precise meaning in mathematical terms. Our intention is to 
specify this notion as a discontinuous change of a biological observable (or, at least, of one 
of its derivatives) as a function of /x. In any finite system with back mutations, however, 
this clearly confiicts with the fact that the population frequencies are analytic functions of 
the mutation rates .[] The same problem also arises for the definition of phase transitions in 
physics. Phase transitions, therefore, are defined as non-analyticity points of the free energy 
in the thermodynamic limit (i.e. for infinitely large systems). Since the infinite mutation class 
limit is just the counterpart of the thermodynamic limit in our models (cf Appendix^, we take 
this concept of theoretical physics as our guideline and characterize different types of mutation 
thresholds by discontinuities or kinks in the equilibrium mean and/or variance of some trait or 
of fitness as a function of /i in the limit N ^ oo. (Therefore, we will omit the subscript oo 
throughout this section.) 

Let us add a few comments concerning this strategy: 

1. Firstly, and most importantly, the proposed procedure is in accordance with the origi- 
nal definition of the error threshold: In the quasispecies model, a kink in the wildtype 
frequency (and thus the mean fitness) as a function of the total mutation rate was first 
established by an approximate formula for finite sequence length by Eigen (1971| ), which 



was later found to be exact in the limit N ^ oo ( Swetina and Schuster, 1982 ). The finite 
system is thus effectively approximated by an infinite one. In order to capture the behavior 
of the finite system in the limit, the total mutation rate and the selective advantage of 
the wildtype must scale with the number of classes N (thus leaving the mean mutational 
effect per class constant; cf Franz and Peliti, 1997] ). The equivalence of this phenomenon 



with a magnetic phase transition has first been established by Leuthausser (1987 ), and 
was later used by Tarazona (1992| ) and many others. 



2. Whereas we have introduced the mutation class limit mainly as an approximation for 
real systems with a finite number of classes, its use in the present context rather has a 
conceptual reason. Analogously to phase transitions in physics, the threshold should be 
considered as a property of the limit that manifests itself (as a 'pronounced change') in 
finite systems as well (cf the numerical examples in Figs. |^, ^, and 11-15|). 



Discontinuities in the biological observables can also arise in finite systems if the evolution 
matrix H is reducible (as for unidirectional mutation). Then mutation thresholds can be 
directly defined for finite N. This has previously been done by Wiehe (1997| ) and will be 



discussed in Section 3.S below. 



6.2 Description of threshold types 

Following the lines of the above reasoning, we now come to a description of different types of 
mutation thresholds. In our list we will not include any discontinuous change that might occur, 

^This follows from the Perron-Frobenius theorem and the fact that the PF eigenvalue and eigenvector depend 
analytically on the matrix entries. Since the PF eigenvalue is real and unique under the above conditions, it never 
crosses with the second largest eigenvalue as a function of any model parameter, such as mutation rates. 
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FIG. 7. The error threshold of the sharply peaked landscape (left) with r(0) = 7 (bullet) and 
r{x) = for a; > (line), for the biallelic model with symmetric mutation (k = 0). The observable 
means are shown in the middle, the variances on the right. Symbols correspond to A'^ = 100, lines to the 
expressions in Section ^. The ancestral fitness f{p) (not shown) jumps from 7 to at = 7. Note that 
Vr follows the scaling described by ( ^ ) and is given by (|3^) for — > 00. 



but rather concentrate on pronounced changes of potential evolutionary significance. To this 
end, we will take the original error threshold of the sharply peaked landscape as our reference 
and analyze four of its characteristic properties, namely (cf Fig. |^: 

• A kink in the population mean fitness, 

• the loss of the wildtype from the population, 

• complete mutational degradation, and 

• a jump in the population mean of the mutational distance (or some additive trait). 

For these threshold effects, we will check whether and how they extend to the permutation- 
invariant class of mutation-selection models. We will discuss their origin, analyze how they are 
related, and formulate criteria for the fitness function to exhibit each threshold effect, or type of 
threshold, separately. 



6.2.1 Fitness thresholds 

As we will see below, the kink in the population mean fitness is, in many respects, the most 
fundamental aspect to classify mutation thresholds. We therefore discuss it first. 



Phenomenon. The most pronounced change that may happen to the fitness distribution at 
some critical mutation rate /ic is characterized by a kink in the mean fitness f as a function 
of 11 (i.e. a jump in its derivative). We will refer to this phenomenon as a mutation threshold 
in fitness, or fitness threshold for short. Using Eq. (51) and the maximum principle, we see 
that an alternative definition can be given in terms of the ancestor distribution. Here, a fitness 
threshold is defined by a jump in the mutational loss (as a function of /i), g = g{x) = —fj, dr/dfi, 
corresponding to jumps in x and the ancestor mean fitness r = r{x). As a consequence of the 
kink in f, the mean mutational distance x, and the variances vr and vx, will typically show a 
kink as well. 
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^ x' 1 ^ 

FIG. 8. Graphical construction of the fitness threshold, following Fig. ^. At the critical mutation rate 
/ic, the maximum of r[x) — g{x) is not unique. Thus, with ^ being increased across /ic, the mean of the 
ancestor distribution jumps from a position of relatively high fitness and high mutational loss, x, to lower 
fitness genotypes with less mutational loss at x' . The figure also shows how the population mean fitness 
is constructed at the threshold. 
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FIG. 9. Means (middle) and variances (right) for a biallelic model with asymmetric mutation (k = 0.4), 
and a fitness function r/7 (left) that displays strong positive epistasis near x = 0.15. One therefore 
observes a fitness threshold (/ic/7 — 0-562). Symbols correspond to = 100, dashed lines to = 500, 
and solid lines to the expressions in Section |^. 

Interpretation and graphical representation. The origin of a fitness threshold is easily 
understood from the maximum principle. For a generic choice of ^, the function r{x) — g{n, x) is 
maximized for a unique x = x. For some fitness functions, however, there are particular values 
of fi that lead to multiple solutions. It is precisely this phenomenon of two distinct ancestor 
distributions becoming degenerate with respect to the maximum principle which marks the 
threshold. This may be illustrated graphically as shown in Fig. |8|. 

Let us add a remark concerning the transferability of these notions to the original 'biological' 
model with fixed, finite A^. In defining fitness thresholds in the mutation class limit, we have 
tacitly assumed that the fitness function r reasonably interpolates the discrete fitness values 
of the original model. In order to avoid 'pseudo-thresholds' driven by purely local features of 
the fitness function on a scale smaller than 1/A^, the effects should be stable under different 
interpolations. Note that one way to assure this is to apply the maximum principle only to the 
discrete point set {xk} = {k/N} and ask for a jump in x over more than one mutation class. 
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In any case, the example in Fig. ^ and those in Figs. p^l5 show that the threshold effects are 
usually clearly visible also for finite N. 



Criterion. To derive a criterion for the existence of a fitness threshold for a given fitness 
function r, we use the following argument. According to the above definition, a fitness threshold 
is signaled by a jump in x. Thus, in any fitness landscape without a threshold, £(/u) varies 
continuously from the wildtype position Xmm '■= hm^^o to the position of the mutation 
equilibrium, Xmax := hm^^oo S:{fj.), where g{xmax) = (for the biallelic model, Xmax = (l + '«)/2). 
Therefore, at each x in the half-open interval [xmim 2;niax[ the maximum in (30) is attained for 
some finite fi. If r and are twice continuously differentiable in the closed interval [xmm, a^max], 
then g is twice continuously differentiable in ]xmin; a^max[ and we arrive at the following sufficient 
condition for the non-existence of a fitness threshold: 



Vx G ]x^ 



;[ 3fi > : r' (x) = g' {fi, x) and r" (x) < g" {p, x) . 



(57) 



Expressing fi = ^(x) through the derivatives of r and g, we can state an existence condition in 



the following general form, cf Appendix C.l: 



There is a fitness threshold in the mutation-selection equilibrium at some critical mutation rate 
fic if and only if 



sup 

For the biallelic model, this reads: 



r"(x) 



sup 

[^mirn^max] 



r"{x) 



r'{x)g"{x) 
9'{x) 



-r'(x) 



> 0. 



(58) 



V 



2x(l 



1 - 2x + 2k 



x{l—x) 



> 0. 



(59) 



In the special case that the supremum in (58) is zero, but is assumed only in a single point xq, 
there is actually no jump in x. Here, we obtain limiting cases of a threshold, in the sense that 
a jump in x may be obtained by arbitrarily small changes in the slope or curvature of r or g. 
Typically, this limiting behavior is indicated by an infinite derivative of the function x{^) at 
X = Xq (cf Appendix C.l).^ 



Discontinuities in the fitness function or its derivatives can formally be included in (58) by 
considering left- and right-sided limits separately. For a kink in r, we formally set r" = oo or 
r" = — oo, respectively, if r' increases or decreases at this point (which makes (^) true in the 
former, but not in the latter case). Finally, a jump in r always results in a fitness threshold. 

Note that the criteria presented here do not indicate whether there are one or multiple 
thresholds for a given combination of r and u^. Neither do they provide direct information 
about the value of f at the threshold, or about /^c- In fact, (^) and (|59|) are independent of the 
scalar factor /i, but only depend on the shapes of the mutation and fitness function. Answers to 
these questions, however, are easily derived from the maximum principle for any specific r and 
M^, and may also be obtained from the graphical construction, cf the discussion in the preceding 
paragraph. 



In physics, this kind of behavior corresponds to the important class of continuous phase transitions, cf Ap- 
pendix Ul. In the biological models, however, these non-generic limiting cases do not seem to justify a category of 
their own. 
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FIG. 10. The figure shows, as a solid line, the minimum exponent 9, parametrizing epistasis of the 
fitness function r(x) = (xmax — x)'' , that is needed to obtain a fitness threshold in the biallelic model as 
a function of the asymmetry parameter k of the site mutations rate. The exponent varies continuously 
from quadratic (for symmetric site mutation, k = 0) to linear (for unidirectional mutation, k = 1). 
For q > 2 (dashed line), the fitness threshold is also a degradation threshold (see Section |6.2.3 ). For 
this combination of fitness and mutation functions, a wildtype threshold only occurs for unidirectional 
mutation (k = 1). 



Discussion. Under what conditions should we expect a fitness threshold to exist in a mutation- 
selection system? The above criterion (^) compares r", which measures the epistasis of the 
fitness function, with g" weighted by a factor r' /g'. Under the reasonable assumption that the 
rate of back mutations u~ increases with the distance to the wildtype, whereas the rate of dele- 
terious mutations decreases, we have g' < and r'/g' > for decreasing fitness functions. 
Typically, if the curvature of and is not too large, we also find g" > 0. The criterion then 
shows that a finite minimum strength of positive epistasis (r" > 0, cf the end of Section |5.1| ) is 
required for a fitness threshold. For the biallelic model with r{x) = (xmax — 2;)'?, this is shown 
in Fig. IC. Vanishing curvature or even concavity of the mutational loss function, g" < 0, on 
the other hand, may even lead to thresholds for fitness functions with negative epistasis. 

As will become apparent in Appendix the fitness threshold as defined above is the bio- 
logical counterpart of a first order phase transition in physics. Since x, which translates into 
the magnetization, plays the role of the order parameter, the phase transition is generically first 
order, and continuous only in the limiting case mentioned above. Note that positive epistasis 
with quadratic exponent g = 2 in a biallelic model with symmetric site mutation (k = 0), as 



has been discussed by Baake and Wagner (2001 ), is just such a limiting case. The physical 
analogy shows that a fitness threshold is indeed a true collective phenomenon on the level of the 
sites or loci. The essential self-enhancing effect simply is that in regions of positive epistasis the 
selection pressure decreases with any new deleterious mutation. 



6.2.2 Wildtype thresholds 

The loss of the wildtype is the classic criterion for the original error threshold as defined by 
Eigen (197X1) : For the sharply peaked landscape, the frequency po of the wildtype (or master 
sequence) remains finite for small mutation rates even for ^ 00, but vanishes above the 
critical mutation rate. The same effect may be observed for any fitness function with a jump at 
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the wildtype position rCmin-H Note that this does not depend on whether we assume the wildtype 
class to contain only a single or a large number of genotypes (the latter case has sometimes been 



called the phenotypic error threshold, cf Huynen et al, 1996) 



If r is continuous at Xmin, however, the population distribution spreads over a large number 
of mutation classes with similar fitness for any finite mutation rate. While for finite N the 
frequency in any class remains positive for arbitrary fx (as long as there are back mutations), 
the frequency of any single mutation class (including the wildtype class) vanishes for — > oo. 
According to the original definition, error thresholds therefore depend on strongly decanalized 
wildtypes in the sense that deleterious mutations with small mutational effects are virtually 
absent. While such a model was found to be adequate in certain cases, such as the evolution 
of coliphage Q/3 and certain viruses (cf [Eigen and Biebricher, 1988| ), and could be favored by 



pleiotropy ( Waxman and Peck, 1998| ), slightly deleterious mutations are generally assumed to 



occur in most biologically relevant situations (Kimura, 1983, Ch. 8.7; Ohta, 1998| ). 



Still, one may ask for some related phenomenon that goes together with the loss of the 
wildtype in all models in which this effect is observed,^ but defines a threshold also in a broader 
model class. The fitness threshold as defined above does not meet this requirement, since fitness 
functions with a jump at the wildtype may well have multiple fitness thresholds, but only lose 
their wildtype once. Instead, we will give a definition which is based on the ancestor distribution. 

Phenomenon. We define the wildtype threshold as the largest mutation rate fi^ > below 
which the ancestral mean fitness coincides with the fitness of the wildtype: 

f{fl) = r(0) = Tmax, fJ-< fJ-c ■ (60) 

The threshold may equivalently be defined as the largest n~ below which x{fi) = x^i^. As 
a consequence, the population mean fitness f responds linearly to an increase of the wildtype 
fitness ^ < ^~ , but becomes independent of (sufficiently small) changes in the wildtype fitness 
above the threshold. 

Note that for unidirectional mutation, the ancestral average x (in general) also denotes the 
fittest class with non- vanishing equilibrium frequency for any finite N, cf Eq. (^4[). In this special 
case, the wildtype thus indeed vanishes from the population at Threshold criteria in models 
with special unidirectional mutation schemes have been derived previously, see the discussion 



below in Section 3.3 



Criterion. For a wildtype threshold to occur, r{x) — g{fi,x) must be maximized at x = Xmin 
for some ^ > 0. Assuming r and g to be continuously differentiable for x > x^aim we arrive at 
the criterion 

5(1,3;) - ff(l,Xmin) g'{l,x) 
hm r-r^( ^ = 1™ u N < 00 ' (61) 

see Appendix |C.2| for a proof. Fitness functions with a jump at the wildtype position lead to a 
threshold for any continuous g. 



^As the mean fitness varies continuously, tiie wildtype frequency in the limit decreases linearly with the 
mutation rate, until the mean fitness reaches the lower value at the jump. For larger mutation rates, the wildtype 



frequency in the limit is zero due to the sharpness of the population distribution for A'^ ^ 00 (cf Section 4.4). 

^"i.e. basically for fitness functions with a jump at the wildtype, and for certain models with unidirectional 
mutation, see the discussion in Section |6.3| 
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FIG. 11. 



Means (middle) and variances (right) for a model with symmetric mutation (k = 0), N = 100 

2 



(symbols), and the fitness function r{x) = jl — with an additional single peak of height 7 at 
a; = (left). Due to the latter, one finds a wildtype threshold (/i^/7 — 0.641) 
threshold. Lines correspond to the expressions in Section |^. For 1 < r/7 < | 
variance in fitness no longer follows Eq 



which is also a fitness 
•e. < ^/7 < i, the 
61), but scales diff'erently and is given by (37) for —> 00 (see 



the discussion in Section 4.4). For finite iV, we can approximate vr by a combination of both relations, 
where (^7|) and (^6|) dominate for small and large /i, respectively. Note that f is analytic at = \\ we 
thus have no fitness threshold at this point. 



Discussion. Note first that a wildtype threshold will always lead to non-analytic behavior of 
:e(/u) and f(/i) in and is therefore closely related to a fitness threshold. In general, however, 
it need not show up as a prominent feature with a jump in means or variances as functions of 
the mutation rate. If we have a fitness threshold with a jump in xi^^i) at x = Xmiru however, this 
will also be a wildtype threshold. In a system with a series of thresholds, the wildtype threshold 
(if it exists) is always the one with the smallest /Uc- 

The existence of a wildtype threshold, and also the 'loss of the wildtype' where applica- 
ble, depends on the strength of the deleterious mutational effect at the wildtype, measured by 
^''(a^min)- The degree to which the wildtype requires a fitness advantage to avoid the threshold 
depends on the mutational loss function. If g has a finite derivative at Xmin, we always obtain a 
threshold if the mutational effects do not tend to zero. In many important situations, like the 
biallelic model and Xmin = 0, however, a wildtype threshold requires fitness functions with a 
rather sharp peak, like r(x) ~ —x^ with p < 1/2 or the one used in the example in Fig. 11. Note 
that this result depends on back mutations, which make the slope of g diverge at x = 0. For 
vT = 0, however, the situation changes drastically, and we obtain a threshold if only r'(0) < 0, 
as described above. 

Since g'(O) = «'''(0), we see from Eqs. ( |30| ) and ( |33[ ) that f and x (but not necessarily the 
variances) are unaffected by back mutations for mutation rates below the wildtype threshold. 
Further, the mutation load coincides with the mutational loss, I = rmax — r = r — f = u~^{0), 
and therefore provides a meaningful measure for changes in f if the mutation rate is varied. In 
this sense may be seen as a point up to which back mutations can be safely ignored. 



6.2.3 Degradation thresholds 

Phenomenon. A far reaching effect of the error threshold is that selection altogether ceases 
to operate. We define a degradation threshold as the smallest mutation rate //^ above which 
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the population mean fitness is insensitive to any further increase of the mutation rate: 



— = -g{l,x(p)) = 0, ^i>^4- (62) 

This is equivalent to the condition x(/i) = Xmax for /x > /x^. Also, the other means and variances 
then coincide with their values in mutation equilibrium, and the population is degenerate. 

Criterion. Selection ceases to operate according to the above definition if and only if r{x) — 
g{^J-,x) is maximal at x = Xmax (where 5(/x,Xmax) = 0) for any finite ^ > Since g is 

continuous and strictly positive for x < Xmax and > 0, it is sufficient to compare the asymptotic 
behavior of r and g in the neighborhood of a^max, cf Appendix 



r(x) - r(a:max) r'{x) 
lim — = lim — - — - < oo . (63) 

X^Xmiix g{l,X) X^Xmiix g' [1, X) 

Discussion. The degradation threshold is related to the fitness threshold in an analogous way 
as the wildtype threshold above. In particular, we always find non-analytic behavior of x{p) and 
f(^) at but not necessarily a jump or a kink. However, a fitness threshold with a jump of 
x{iJ,) onto Xmax IS neccssarily a degradation threshold. If there is a series of thresholds connected 
with a system fulfilling (|6^), the degradation threshold obviously is the last one as fi increases. 

The criterion ( |63| ) implies an important necessary condition for a degradation threshold, 
namely r(3;max) > — oo; i.e. genotypes should not be lethal at this point. This parallels a well- 
known sufficient condition for the existence of a normalizable limit distribution for arbitrary 
mutation rates in models with non-compact state space ( [Moran, 1977] ; Biirger, 2000 , p. 128). 



From a biological point of view, a finite value of r(xniax) means that not the whole genome, 
but only the part relevant for a specific function or phenotypic property is included in the 
model, and the genetic background is under sufficiently strong selection to be stable under 
the mutation rates considered and guarantees survival of the population. We may then obtain 
mutational degradation w.r.t. the function under consideration if this function is less robust 
under mutation than the background, and fitness thus levels out at a finite value. Essentially, 
this is the threshold criterion previously given by Wagner and Krall (1993| ) in their treatment 



of single-step models with unidirectional mutation (see the discussion in Section 6.3). 

For the more general model with back mutations, we see that r{x) must approach the fitness 
level at r(xniax) sufficiently fast in order to fulfill (|63|). For the biallelic model, it is easy to show 
that we need positive epistasis with at least a quadratic exponent, i.e. r ~ ?^(xmax)+a {xmax—xy. 
Clearly, we always obtain mutational degradation if r(x) = r(xmax) already for x < Xmax, 
corresponding to the reasonable assumption that a minimum of non-random coding region is 
needed for the gene or function considered to show a fitness effect at all. An example for a 



degradation threshold is given in Fig. [12. 



Note finally that we obtain a degradation threshold that at the same time is a wildtype 
threshold (and a fitness threshold with a jump of x from Xmin to Xmax) if and only if 

r[x) - r(xmax) - g{x) < , (64) 

as is most easily seen with the help of the graphical representation, cf Fig. |^. Clearly, Eq. (|6^ ) 
is fulfilled for the sharply peaked landscape used in Fig. |^, but also for truncation selection, see 
Fig. p. 
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FIG. 12. Means (middle) and variances (right) for a model with asymmetric mutation (k = 0.8), 
A'^ = 100 (symbols), and the fitness function r{x) = 7 (xmax — a:^)'^/(a;max)^ with Xmax = (1 + '«)/2 — 0.9 
and q — 2.2 (left). As g > 2, one finds a degradation threshold {nt ll — 0.606), which is also a fitness 
threshold, cf Fig. As f behaves just like r{x) with a similar accuracy of the approximation, it is not 
shown here. 
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FIG. 13. Means (middle) and variances (right) for a model with symmetric mutation (k = 0) and 
truncation selection, i.e. r{x) = 7 for x < | and r{x) — otherwise (left). As in the sharply 
peaked landscape, cf Fig. 0, one finds a combined fitness, wildtype, degradation, and trait threshold 
(/^c/7 — 2.94). Also, the variance in fitness follows the different kind of scaling as described by ( |49| ) and 
is given by (|3^) for N 00. Symbols correspond to = 100, dashed lines to = 1000, and solid 
lines to the expressions in Section ^. As f behaves just like r{x) with similar accuracy, it is not shown 
here. Note that the deviations of the approximate expressions are somewhat stronger (of order N~^/'^) 
for fitness fimctions with jumps, cf Section 5.3. 
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FIG. 14. Means (middle) and variances (right) for a model with asymmetric mutation (k — 0.5) and 
a fitness function r/7 (left) with an ambiguity for r{x)/j = 0.5. Thus, one finds a trait threshold 
{Hc/l — 0.372), which precedes a fitness threshold (^c/7 — 0.408), cf Section |6.2.4 . Symbols correspond 
to N = 100, dashed lines to = 500, and solid lines to the expressions in Section ^. As f behaves just 
like r{x) with similar accuracy, it is not shown here. 



6.2.4 Trait thresholds 

Phenomenon. As stated above, there is usually a kink in the population mean of the muta- 
tional distance x{fi) (or some other trait) at a fitness threshold. The most pronounced change 
in the equilibrium distribution of x, however, is a jump of x at some mutation rate //^, referred 
to as a trait threshold. Since a discontinuous change in x is usually accompanied by a jump in 
the local mutation rates u^{x) as well as r'(x), it typically also leads to jumps in vx and vj^. 
The mean fitness, however, is not at all affected at such points (if they do not coincide with a 
fitness threshold as defined above). 



Criterion. Since the equilibrium mean fitness f(/x) as a function of the mutation rate is always 
continuous, we easily conclude from f = r{x) that a jump in x occurs if and only if the fitness 
function is not strictly decreasing from Xmin to x^aax- 



Discussion. Obviously, any fitness landscape with a trait threshold also fulfills (|5q ) and thus 
also has a fitness threshold, but not vice versa. We have fic ^ /"c (i-e. the jump in x in general 
precedes the fitness transition with the jump in x); see the example in Fig. 14. This shows, in 
particular, that with varying mutation rate there may be large changes in the phenotype that 
may be accompanied by changes in the fitness variance, but have virtually no effect on mean 
fitness. Trait and fitness thresholds should, therefore, be clearly distinguished. In contrast to the 
fitness threshold or a phase transition in physics, the trait threshold is not driven by collective 
(self-enhancing) action, but simply mirrors a local feature of the fitness function. 



6.3 Unidirectional mutation 

In this section, we briefiy discuss how the definitions of mutation thresholds specialize for uni- 



directional mutation (k = 1 for the biallelic model). An example is given in Fig. 15. We shall 



also take the chance to make contact with previous results on threshold criteria by Wagner and 
Krall (1993) and by Wiehe (1997), where related models were studied. 



For the above definition of thresholds, the maximum principle in the mutation class limit 
has played a central role. Since, for vanishing back mutations, it reduces to Haldane's principle 
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FIG. 15. Means (middle) and variances (right) for a model with unidirectional mutation (k = 1), 

= 20 (symbols), and the fitness function r/7 shown on the left. The means f and x were calculated via 
the discrete maximum principle ( ^ ) . For x and the variances the population distribution was calculated 
explicitly using the recursion following from (|^) for = 0; solid lines refer to the expressions from 
Section |[ One observes both a wildtype and a degradation threshold. As f is exactly r{x), it is not 
shown here. 



and is also exact for finite N ^ many of the notions above can be formulated directly, avoiding 
the limit. As has been described in Section 4.1, the ancestral mean of the mutational distance 



(or trait) agrees with the minimal x in the equilibrium population. Since this minimum can only 
assume discrete values for finite A^, jumps in x will necessarily occur for some ^. For a system 
with a large number of mutation classes (which we consider here), this should, however, not be 
regarded relevant. In line with the above reasoning on the applicability of the fitness threshold 
definition for finite systems and previous definitions of the error threshold for unidirectional 
mutation, it seems reasonable to restrict the term threshold to the first and the last jump, 
i.e. the loss of the wildtype (Wagner and Krall, 1992; Wiehe, 1997| ) and the point of complete 
mutational degradation ( Wiehe, 1997 ), and to jumps of x over more than one class. 

Threshold criteria are easily found as analogs of the above relations (note that g reduces to 
if back mutations vanish). As condition for the existence of a fitness threshold with a jump 
over more than one class, for example, we obtain for monotonic u^: 



max 

k 



'fc+1 



''k+2 



> 0. 



(65) 



We always find a wildtype threshold with loss of the fittest class if the total range of fitness 
values is finite. If there are lethal genotypes (r^ = —00), we obtain no such threshold if (and 
only if) the mutation rate at all non-lethal types is larger or equal to the mutation rate at the 
wildtype. For the special case of a constant mutation rate {u^ = const, k < N, = 0), this 
reproduces the criterion by Wagner and Krall (19931 ). A degradat ion threshold is found if and 
only if there are no lethals. 

For the special case of the biallelic model with linearly decreasing mutation rates, Eq. ( |65| ) 
reduces to max^s^/s^^^ > 1 and we obtain a threshold for any degree of positive epistasis, 
but also for linear parts of the fitness function (with any three fitness values on a straight line). 
For fitness functions of the form r{x) = ro(l — x)°, finally, we can confirm the result by |Wieh^ 



(1997 ) that wildtype and degradation thresholds coincide if and only if a < (no or negative 
epistasis). Note that the result by [Wiehe (1997 ) was derived for a different mutation scheme 
(with mutations coupled to reproduction). 
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6.4 Variation of fitness values and sequence lengths 

Up to this point, we have discussed mutation thresholds as effects that may occur as the mutation 
rate varies, while the fitness function and the number of mutation classes are kept fixed (note 
that the mutation class limit is always understood as an approximation to a given finite system) . 
Here are two alternative points of view. 

Firstly, we can consider threshold effects as the fitness values vary, while the mutation rates 
remain constant. As already mentioned in the discussion of Haldane's principle (Section |5.2| ), 
mean fitness is largely independent of local variations in the fitness function, but only depends 
on the shape of r in regions with substantial weight in the ancestral distribution. For most 
values of the mutation rate, this has a unique peak, and therefore only the neighborhood of 
the mean ancestral mutational distance, x, matters. At fitness thresholds, however, we find, in 
general, two peaks at which variations in r can change the mean fitness. 

Secondly, in line with the original work of Eigen (1971| ), we can increase sequence length while 



leaving the mutation rate per site fixed, thus altering the total mutation rate. The question of 
interest then is: Given a certain (fixed) fitness advantage of some function, and fixed mutation 
rates per site in a molecular model, how long can the coding region for the function become 
and still be maintained intact by selection? In this case, with u^/j ~ iV (where 7 denotes the 
range of fitness values under consideration), we obtain thresholds which are inversely related 
to sequence length, ~ 1/-^) in all situations above. Note that this is in accordance with 
the original findings for the sharply peaked landscape (cf [Eigen and Biebricher, 198^ ), but at 



variance with results by Wiehe (1997 ). The latter are artificial effects caused by the use of a 
different scaling of the fitness functions, in which 7 is not kept fixed, but increases with in 
just those cases where conflicting results have been found. 

6.5 Implications of mutation thresholds 

At mutation thresholds, mutation-selection balance is unstable with respect to small changes 
in the model parameters. There is no real lower limit on the mutation rates at which these 
phenomena may happen, but for fitness and degradation thresholds mutation rates must be 
comparable to the average mutational effect 7 to obtain effects of significant magnitude (cf our 
examples in Figs. I and OHll). In this case, the average effect of the mutations considered 
will be very slightly deleterious (or almost neutral) for realistic values of the mutation rate. 
The model then pays respect to the rationale that these mutations are the relevant ones for the 



discontinuous behavior. Since they may be numerous (cf, e.g., Kondrashov, 1995), their collective 
effect may nevertheless be quite large. Mutations with much stronger effects, on the other hand, 
will only occur at very low frequency in the population and contribute smooth changes to the 
system observables if the mutation rate is varied. They may therefore be excluded from these 
considerations. 

An important consequence of the original error threshold of the sharply peaked landscape 
(and, more generally, of any degradation threshold in our typology), which has been stressed in 



particular by Eigen (1971 ) as well as by Maynard Smith and Szathmary (1995 , Ch. 4.3), is its 



potential importance for the evolution of mutation rates. Since the total mutation rate increases 
with the sequence length (see the previous subsection), site mutation rates must evolve below the 
threshold value to allow functions to prevail that need a certain minimum length of the coding 
region as their genetic basis. This might have been a severe problem for early replicators since 
the mutational repair mechanisms required to reduce the mutation rate depend on enzymes with 
relatively large coding regions. Since we find degradation thresholds for a rather broad class 
of fitness functions, this is also a plausible hypothesis with respect to our more general model 
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class .0 

A closer look at the effect of thresholds on the mutational loss reveals yet another mechanism 
by which degradation thresholds, and fitness thresholds as well, may be important for the 
evolution of mutation rates, even if mutational repair itself is not the function endangered. 
Assume that the mutation rate may be reduced by modifications of the replication accuracy. 
Recall further that the mutational loss g = r — f provides a measure for the indirect fitness 
advantage 5f gained by the decrease of //. Therefore, a system beyond a degradation threshold 
(where g = 0) will never experience any selection pressure for decreasing mutation rates, and 
thus cannot evolve in this direction. But even a fitness threshold (with a jump in g, but g > for 
fi > /ic) may have a similar effect. This is because modifiers for reduced mutation usually have 
deleterious physiological side-effects, dubbed the 'cost of fidelity' (see Sniegowski et al, 200C, 
for a recent review). Clearly, for the modifier to prevail, the indirect fitness advantage 6f gained 
by the decrease of fi must be at least as high. Therefore, a jump in g separates two different 
evolutionary regimes: for fi < /j^c, much larger costs can be counteracted than for fj. > fic- 

In a second line of interpretation, the critical mutation rate of an error threshold has often 
been argued to provide a strict upper limit that must be avoided in all real organisms. Certain 
kinds of viruses are perceived as thriving just below that value as to maximize their adaptability 
in a changing environment ( Eigen and Biebricher, 1988 ). While it is certainly true that wildtype 
sequences or certain functions can get lost at threshold points, it is, however, much more difficult 
to argue why evolution should care about them. After all, g drops at the threshold, thus making 
a further increase in adaptability less costly. Further, the equilibrium mean fitness changes 
continuously with the mutation rate in arbitrary deterministic mutation-selection systems, even 
at threshold points. Mutation thresholds, therefore, can not be seen as strict limits constraining 
the evolution of mutation rates. This may be different if further evolutionary forces are relevant, 
most importantly drift. Indeed, numerical studies show that the mean fitness (averaged over 
time) may drop discontinuously at critical mutation rates in a finite population ( [Nowak and 



[Schuster, 19891) . A jump in the mean fitness has also been found for sexually reproducing 
populations with dominance ( [Higgs, 1994| ). This, however, is outside our model class and, 
according to our definition, no longer a property of a mutation threshold but essentially a drift 
(or segregation) phenomenon. 

Let us finally turn to yet another effect that has previously been described as characteristic 
of the error threshold (e.g. Bonhoeffer and Stadler, 1993 ). Assume that mutation classes increase 
in size with the distance from the wildtype (as is the case for the biallelic model for k up to 
N/2), which is reflected by asymmetric mutation rates between neighboring classes. Then, a 
jump in X, as at the critical mutation rate fi^ of a trait threshold, entails a delocalization effect. 
It should be stressed, however, that this effect has no direct consequences for the evolution of 
mutation rates, which are entirely connected to the population mean fitness and thus only to 
fitness, wildtype, and degradation thresholds. 



7 Summary and outlook 

The findings of this article, and the future directions they might lead to, fall into three parts, 
which we would like to discuss in turn. 

^^Note that the fitness effect of mutational repair is always an indirect one caused by an increase in the copying 
accuracy in parts of the sequence that are directly related to fitness. 



41 



Ancestors. As a crucial concept for the study of (asexual) mutation-selection models, we 
have identified the ancestor distribution of genotypes, or genotype classes, which, in mutation- 
selection balance, is the equilibrium distribution of the time-reversed evolution process. The 
ancestor frequency of the i-th genotype (or class) is given as aj = ZiPi, where Zi, the relative 
reproductive success, and pi, the equilibrium frequency, are the i-th entries of the left and right 
leading (PF) eigenvectors of the evolution matrix. In the biology-physics analogy laid down 
in Appendix the ancestor distribution corresponds to the distribution of the bulk magneti- 
zation in spin models. Biologically, measurements of ancestor frequencies in real populations 
should in principle be possible by marker techniques. In the equilibrium dynamics, the ances- 
tors permanently feed the swarm of mutants that is observed at any instant of time. Significant 
evolutionary change is indicated by modification of this ancestor population. We have shown 
this in a couple of instances. 

If the fitness values Ri are subject to change, the measure the sensitivity of the equilibrium 
mean fitness R to these changes. The net total change in R is given (to linear order) by 
Eq. (25). If the fitness changes are due to a modifier mutation, Eq. ( [55| ) can be read as the 
selection coefficient of this modification with respect to the ancestors. Such a modifier will 
asymptotically fix if and only if it increases the fitness of the ancestors. The vector of ancestor 
frequencies can therefore be seen as the gradient which points into the direction of the effective 
selection pressure on the fitness function and determines the course of evolution - given that 
the appropriate modifier mutations are available. 

Since the Oj are non-negative, selection will always favor an increase of fitness values. In the 
case of modifiers that change the mutant fitness values, we thus find a tendency for the evolution 
of robustness, or canalization, in systems with back mutations, whereas anti-robustness, or de- 
canalization, cannot result in this simple setup. As the selection pressure is strongly differential, 
one can even speculate that this mechanism is a cause for negative (synergistic) epistasis (as in 
the example in Fig. |3|), which is con sidered a rather general phenomenon by many ( prow and 



Simmons, 1985 ; Phillips et al, 200C , and references therein). As always with indirect selection, 
however, selective forces are weak and probably of relevance only in large populations and for 
rather high mutation rates. In the limiting case of unidirectional mutation, the ancestor distri- 
bution is concentrated at the wildtype (if present in the equilibrium population). Then, only 
modifiers that increase the wildtype fitness will go to fixation, whereas modifications of mutant 
fitness values have no effect on the equilibrium mean fitness - in line with the predictions of the 
Haldane-Muller principle. 

We have defined the mutational loss G as the difference between ancestor and population 
mean fitness, which equals the long-term loss in progeny that the equilibrium system suffers 
due to mutation. Eq. (|2j) shows that the loss determines the change in the equilibrium mean 
fitness if the mutation rate is subject to change. Again, it is thus the ancestor distribution that 
provides the link between external variations of model parameters and the equilibrium response. 
We always have G < L, with L the mutation load, and equality only in systems without back 
mutation to the fittest type. Measurements of the mutational loss should be possible by fitness 
measurements in mutator strains or by direct determination of the ancestor fitness distribution 
using genetic markers. 

The ancestor concept, as introduced in this article, is independent of modeling assumptions 
on fitness landscapes and mutation schemes. We have derived a few basic results that hold 
for this general case, and extend the Haldane-Muller principle. Under additional assumptions 
much stronger results may be obtained, as we have seen for the single-step mutation model. We 
expect that, of the many successful approximation methods that are routinely applied to the 
population distribution, some could also be applicable for the ancestor distribution and yield 
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further interesting results. However, in order to apply this approach to more general situations, 
namely including genetic drift, the concept will have to be extended. The question is whether 
it is possible to characterize the distribution of genotypes on a single lineage backward in time, 
and to relate this to the mutation-selection-drift equilibrium. 



The maximum principle. The reformulation of the equilibrium condition in terms of an- 
cestor variables leads to a maximum principle for the equilibrium mean fitness, which we have 
exploited for the single-step mutation model. In this model, fitness is an arbitrary function of 
the number of mutations (or some other additive trait). Mutation proceeds stepwise on the 
mutation classes, but mutation rates (as well as back mutation rates) may vary from class to 
class. Here, the maximum principle may be recast into a particularly simple form, which yields 
the mean fitness as the maximum of the difference between the fitness function and the mu- 
tational loss function (see Eqs. ( |30[ ) and (|3l|)). The position of the maximum determines the 
mean ancestral genotype and the corresponding value of the mutational loss function yields the 
mutational loss G (Eq. (|33|)). The simplicity of the maximum principle results from the fact 
that maximization is over one single scalar variable only, and may be performed explicitly, or 
with the help of a simple graphical construction (Fig. ^). A different maximum principle has 
been suggested previously for mutation-selection models ( Pemetrius, 198^ ). It relies on gen- 
eral variational principles in the framework of ergodic theory, in which maximization is over all 
possible genealogies, and therefore not constructive. 

Our maximum principle is exact in three independent limiting cases, namely unidirectional 
mutation, models with a linear dependence of both mutation rates and fitness on an underlying 
trait (including multilocus wildtype-mutant models without epistasis), and in the limit of an 
infinite number of mutation classes. For small back mutation rates, u~ ^ (u"^ + s), the resulting 
estimate for the equilibrium mean fitness is exact to linear order in . In general, the maximum 
principle holds as an approximation that leads to quantitatively reasonable results for a wide 
range of parameters and quickly becomes accurate if one of the exact limits is approached. 

Starting from the mean fitness, we have explicitly calculated the fitness variance and the 
mean and variance of the trait. All formulas are collected in Section ^. The fitness variance 
is both proportional to the mean mutational effect and the mean difference of deleterious and 
back mutation rates; the trait variance has the same dependence on the mutation rates, but is 
inversely proportional to the mean mutational effect (Eq. (35)). These formulas give the amount 
of genetic variability that is maintained by the balance between mutation and selection. 

Extensions of the maximum principle to a larger model class is possible in various ways. 
Following the lines of this paper, it is relatively straightforward to include double or multiple 
mutations in the theory. Poisson distributed mutations (which emerge naturally in the biallelic 
model if mutation is coupled to reproduction) can also be treated. A necessary ingredient is that 
the evolution matrix can still be symmetrized by transformation to the ancestor frequencies. 

The models discussed here all assume fitness to depend only on the distance to a reference 
class (the Hamming distance to the reference type in the biallelic case). Especially in a molecular 
context, this is, of course, a severe oversimplification. But also in classical population genetics, 
the importance of variance of additive and epistatic effects has often been highlighted (see, e.g., 
Biirger and Gimelfarb, 1999| ; [Phillips et al, 20001) . Pro gress in this direction can be made by 



applying methods of inhomogeneous mean-field theory from statistical physics to the biallelic 
model. Here, it is possible to derive a simple maximum principle for models in which groups of 
sites or loci have different weights assigned that scale their respective direct and epistatic fitness 
effects (H.W., unpublished results). With similar techniques, fitness landscapes with more than 
one trait, such as the multiple quantitative trait model ( Taylor and Higgs, 200C| ), can also be 
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treated. Here, the equilibrium mean fitness is derived from a maximum principle over an n- 
dimensional space, if n is the number of traits. Finally, multilocus models with more than two 
alleles per locus (or states per site) may be considered. In the molecular context, an explicit 
treatment of the four-letter case with Kimura 3ST mutation scheme (cf SwofFord et al, 1995| ) 
has already been given by Hermisson et al. (2001| ). 



Mutation thresholds. Inspired by the definition of phase transitions in statistical physics, 
we have used the concept of the mutation class limit to define threshold behavior in mutation- 
selection models as the discontinuous change of statistical observables (such as the mean fitness 
or the mean number of mutations) with the mutation rate ^. Four different types of thresholds 
have been singled out, which all coincide in Eigen's original error threshold for the model with 
the sharply peaked landscape, but should be distinguished for general fitness functions. With 
the help of the maximum principle, criteria have been given to characterize the fitness functions 
and mutation schemes that lead to each type of threshold. 

Fitness thresholds are characterized by a kink in the population mean fitness and a jump 
in the mutational loss G. They precisely occur at mutation rates for which the equilibrium 
ancestor distribution that solves the maximum principle is non-unique in the mutation class 
limit. The evolutionary significance of a fitness threshold lies in its potential impact on the 
evolution of mutation rates. Since the mutational loss jumps and may take much smaller values 
for /i exceeding the critical mutation rate, the gain in mean fitness by reduction of fi may be 
very small until the threshold is reached. If this gain in fitness must (over) compensate costs 
connected with mutational repair, evolution for lower mutation rates might be slowed down 
in the presence of a threshold. For the existence of fitness thresholds, positive (antagonistic) 
epistasis is required for many mutation schemes. Small convex parts of the fitness function, 
however, may already be sufficient. Fitness thresholds are collective phenomena and correspond 
to phase transitions in physics. 

Whereas the loss of the wildtype from the population is not a well-defined notion for most 
of the models treated here, we consider the ancestor mean in the mutation class limit instead. 
A wildtype threshold is then characterized by a critical mutation rate yU^T > below which 
the ancestor mean fitness coincides with the wildtype fitness, and the ancestor distribution is 
concentrated at the wildtype. Below a wildtype threshold, the system behaves, in many respects, 
as a system with unidirectional mutation. For the biallelic model, wildtype thresholds occur only 
for fitness functions with very sharp peaks at the wildtype position. 

A degradation threshold is characterized by the fact that selection altogether ceases to operate 
and the mean fitness does not change any further for mutation rates exceeding a critical value /i^ . 
A necessary condition for a degradation threshold is that the fitness function does not diverge to 
minus infinity. This is reminiscent of a threshold criterion derived for a model with unidirectional 
mutation by [Wagner and Krall (1993| ) . Degradation thresholds have similar implications for the 
evolution of mutation rates as fitness thresholds. 

A trait threshold, finally, is characterized by a jump in the trait or mean number of mutations 
X. In the sequence space picture, a trait threshold is connected with (partial) delocalization of 
the equilibrium population in genotype space. It is important to note that a trait threshold is 
not a collective phenomenon but is simply caused by non-monotonic parts of the fitness function. 
The delocalization effect is not connected with any significant change in the mean fitness (unless 
the trait threshold goes together with a fitness threshold) , and thus has no direct impact on the 
selection pressure on the mutation rate. 

The types of thresholds found here should also be observable in mutation-selection models 
with more general fitness landscapes and mutation schemes. Explicit threshold criteria can be 
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obtained at least in some cases, such as the four-state model treated by Hermisson et al. (2001| ) 
(J.H., unpublished result). 



A The connection to physics 

For a number of models from statistical physics, a relation to mutation-selection models has 
been demonstrated, see Baake and Gabriel (200d| ) for an overview. In the present investigation, 
too, concepts and techniques from theoretical physics have served as a guideline for analysis. 
Most importantly, the maximum principle ( |30| ) derives from the physical principle by which 
a system seeks to minimize its free energy. In our definitions of mutation thresholds, we also 
exploited the correspondence between thresholds and physical phase transitions, which has first 

[198]). 



been pointed out by Leuthausser 

Whereas such correspondencies can be very fruitful, they require a detour through the phys- 
ical world, which remains unsatisfactory from the biological point of view. Therefore, our in- 
tention in the body of the article has been to develop and discuss concepts entirely within the 
biological framework. Nevertheless, for readers with a physical background, as well as for bi- 
ologists who are familiar with the interface to statistical mechanics, we will briefly sketch the 
relationship between both approaches. This may, on the one hand, facilitate further transfer of 
methods; on the other hand, limitations of certain 'imported' concepts in the biological context 
become obvious. Last but not least, it is exactly this connection which resolves a few issues that 
had remained enigmatic so far. 

Concentrating on the biallelic model with types s in this appendix, we can rely on a con- 
nection to a model of quantum statistical mechanics that was previously established by Baake| 
et al. (199^ ) (see also Wagner et al.., 199^ ). More precisely, the evolution operator of the biallelic 
model with symmetric mutation was shown to be exactly equivalent to the Hamilton operator of 
an Ising quantum chain (up to a minus sign). Generalizing this slightly to include asymmetric 
mutation rates, and assuming a suitable ordering of genotypes, we may represent the quantum 
chain Hamiltonian as 



H = /i 



^«-I)-K^« + <) 



(66) 



nei 



Here, 



< := I . . . I O I . . . I , 

n—l copies N—n copies 

where a equals x, y or z, and o"^'^'^ are Pauli's matrices, i.e. 



a :- 



1 

1 0/ ' 



-i 

1 



and :- 



(67) 



1 
-1 



The last sum in (|6^) runs over all index sets / C {1, . . . , A^}, and rj^ are the interaction coefficients 
of the spins within the chain, or with a longitudinal field. The collection of the ry^ determines 
the fitness function. Further, there are transversal fields in x and y direction that account for 
mutation. Note that the Hamiltonian is non-Hermitian for asymmetric mutation. 

This equivalence was used by Wagner et al. (1998| ) and Baake and Wagner (20011 ) to solve 
the model for a couple of fitness functions and symmetric mutation {k = 0) with the help 
of methods from quantum statistical mechanics. In the current investigation, we have chosen 
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an equivalent formulation, which remains entirely within classical probability, to analyze more 
general mutation and fitness schemes. In order to briefly sketch the connection between the 
approaches, we first symmetrize H by means of a similarity transform, i.e. H = SHS~^ with 
S := J^^-|^ (cosh(0/2)I + sinh(^/2)(T^) and 9 = artanh(K). (Note that this transformation relies 



on the sequence space representation of H (36), in contrast to the symmetrization in Section 



2.2| , which starts out from a mutation class representation.) 

The central concept now required is the quantum mechanical expectation (O) of an operator 
O, defined by 

(0)(t):=tr(Op(t)), (69) 
where p{t) is the so-called density operator 

p{t) := exp(tH)/tr(exp(tH)) , (70) 

and t corresponds to the inverse temperature. For the choice O := X^^=i obtains the 

quantum mechanical magnetization, which is the crucial order parameter for the quantum chain. 
We will concentrate on the limit t ^ oo (the ground state), where p{t) becomes identical 



with the time evolution operator of the critical branching process we have met in Section 2.2. 
That is, p = iiuit^oo p{t) = limt_+oo exp(t(H — AmaxI)) = PP^ / {p,P) , where (•,•) denotes the 
scalar product, and T means transposition. In this limit, the quantum mechanical expectation 
of any diagonal operator O (defined by the elements Oss) therefore turns out to coincide with 
the corresponding ancestral average (cf Section |2 



(O) = tr(Op) = = Oss^^ = Ossas = O, (71) 



where we have used that as = Ps/X^s'i's' symmetric H, in line with Section |2.2| . In par- 
ticular, the quantum mechanical magnetization (given by Oss = — '^duis, s^)/N)), which 
has, so far, appeared as a crucial but technical quantity unrelated to any biological observ- 
able, now emerges as the mean ancestral genotype x (up to a factor and an additive constant). 
In contrast, the classical magnetization ^gOssPs, which we had termed surplus previously, 
translates into the population average x. Let us note in passing that the change in normaliza- 
tion performed in Baake et al. (1998| , Eqs. (7), (11)) and Baake and Wagner (2001 , Eqs. (55), 



(56)) to formulate Rayleigh's principle for the PF eigenvalue (i.e. Amax = sup(a;,Ha;) where the 
supremum is taken over all x with {x,x) = \\x\\2 = 1) is equivalent to our ancestral transfor- 



mation in (16). This way, we may take advantage of L theory although the original problem 



is inherently in the realm of L^. Finally, the expectation of the non-diagonal operator M is 
(M) = J2s s' Ps^s,s'Ps' = J2s s' ^sMs^s'Ps' (with M := SMS^"*^), which we have identified with 
the loss G in offspring due to mutation (cf Section |5|). 

The concept of ancestral distributions is very general and does not rely on our special dy- 
namical system. It also applies to discrete dynamical systems, as long as they are linear (or may 
be transformed to a linear system) and admit a unique stable stationary state. This is true if a 
system is defined by an iteration matrix T for which the Perron-Frobenius theorem holds (hints 



in this direction may be found in Demetrius, 1983 ). In particular, this applies to a discrete- 



time version of the quasispecies model defined by T^^/ = Vgg'Ws'^ where Vgs' is the mutation 



probability from s' to s, and Ws' is Wrightian fitness of s'. As observed by Leuthausser (1986, 



1987 ) and reviewed by Baake and Wagner (2001 , Appendix II), this model is equivalent to a 



classical two-dimensional Ising model with row transfer matrix T, where the rows correspond to 
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genotypes, and the columns to generations. Hence, every 2D configuration corresponds to one 
line of descent, conditional on nonextinction at present. 

Here, considerable confusion has arisen in the literature as to the distinction and meaning 
of surface and bulk magnetization ( Leuthausser, 1987 ; Tarazona, 1992| ; Franz and Peliti, 1997 ). 
Surface quantities correspond to the last row (in the time direction) of a configuration with 
open boundary conditions, i.e. the current population; therefore, surface averages are population 
averages. In contrast, bulk quantities are averages over the entire 2D configuration. In the limit 
of an infinite number of rows, they become identical with averages over a single row 'in the 
middle' of the configuration (i.e. at infinite distance from both the first and the last row), 
as given by lim„^oo tr(T"OT")/ tr(T^"). Therefore, the bulk average is, again, the ancestral 
average (also compare with (|69|) and (|7l|)). 

Everything we have said so far holds for arbitrary, finite A^. Clearly, the infinite mutation 
class limit — > oo is the thermodynamic limit of the statistical mechanics system with its 
extensive scaling of energy and magnetic field terms. Technical aspects related to this scaling in 
the biological system are covered by Baake and Wagner (200 1] ). While the thermodynamic limit 
may be taken as a matter of course in most classical situations in solid state physics, however, 
the adequacy of the corresponding limit as an approximation in biological applications must be 
thoroughly considered. In particular, the mutation class limit should be clearly distinguished 
from the infinite-sites limit, which is widely used in theoretical population genetics; see the 
discussion in Section 2.7, and Baake and Wagner T2OOII ). 

Clearly, the fitness thresholds described in Section ^ correspond to the phase transitions of the 
physical system, in the sense of a non-analytic point of the free energy of the classical Ising system 
or the ground state energy of the quantum chain (the mean fitness in the biological model). 
Most importantly, the idea to use the thermodynamic limit for the mathematical definition of 



the concept is taken from physics. As we have pointed out (Section 6.1), this is in accordance 



with the original definition of the error threshold for the sharply peaked landscape. It should be 
noted, however, that the fitness functions of the biological system typically lack the symmetries 
inherent in physics. As a consequence, the usual classification of phase transitions in physics 
according to orders of the non-analyticity as well as the consideration of critical exponents 
does not seem to be particularly meaningful in the biological context. Fitness thresholds are 
typically first order and exhibit a jump in the ancestral mean x, which parallels the physical 
magnetization. Note at this point that neither the population mean x (as suggested by p?arazona!| 
19921 ; iFranz and Peliti, 1997|) nor the mean fitness itself (as implicitly in [Higgs, 1994| ) should be 



mistaken as an order parameter, in the sense that jumps in these quantities do not characterize 
first order phase transitions. 



B Proofs from Section g 
B.l The additive case 

Let us prove here that, if fitness and mutation rates depend linearly on some trait = y{xk) 
as described in (^), the system ( ^9|) reduces to just two equations, one corresponding to the 
necessary extremum condition following from (^0|), the other being the defining equation (|3^ ) 
for the ancestral mean y (for y{x) = x). For the sake of simplicity, we write x^ instead of y^ 
here. Taking the difference of two arbitrary equations of the linear system (|39|), say for k and £, 
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divided by ^/ak and y/al, respectively, we get 

(/?+ - - a){xk - XI,) + ^J(i+(i- (^\J Xk{l - Xk-i)^J^^ - \l xiil - x^^x, 



+ l/xfc+lCl - Xk) 



With the ansatz 

Qfc-i 



C- 



Xk 



1 - a^fc-i 



^ ^^^-iL^ VA;G{l,...,iV} 



. (72) 



(73) 



Eq. (72) can be divided by {xk — xi) and becomes independent of k and £. Note that (73) also 
takes care of the boundary conditions a_i = a^+i = if xq = and xn = 1. Summing both 
sides of (1 — Xk-i)ak-i = Cxkdk over k, we obtain C = {1 — x)/x and thus from (72) 

1 - 2x 



-a + V/3+F 



0, 



X) 



(74) 



which is exactly the extremum condition r'(x) = g'{x) following from (^0|). Together with the 
negative second derivative this implies the maximum principle. 

On the other hand, we can use (^) to eliminate ak±i from (|39|). After multiplication by 
y/ok this reads 



ro - axk - r - /?+(! - Xfc) - /? Xk + \//5+/3 ( 



1 — X 



+ (1 - Xk) 



1 — X 



and we obtain by summation over k 

f = tq — ax — — x) — P~x + 2-y//3+/3~x(1 — x) = r(x) — g{x) 



ak = (75) 



(76) 



which corresponds to (|33D. Since fitness is assumed linear in the trait, the mean values with 
respect to the population distribution are also related via f = r(x). 



B.2 The case N 



oo 



Let : [0,1] M>o be continuous and positive, but fulfill ti^(l) = ^^(0) = 0. Let r : 
[0, 1] M have at most finitely many discontinuities, being either left or right continuous at 
each discontinuity in ]0, 1[. Then, with the scaling described at the end of Section "LG, the 
maximum principle (|30D holds in the limit — > oo. 



For a proof, we follow the arguments and notation introduced in Section O. First note that 
the lower bound for f^v in (46) is itself greater than or equal to 



Pk,m,n-= inf {r{y)-g{y))- sup \g{y) - gN{y)\ 



m + n + 1 



(77) 



where Ik,m,n = l^jf^: s-iid the rules for inf/sup have been applied. We will now construct a 
sequence /JAr(x) := Pkpi{x),mM{x),nN{x) fo^' each x G [0, 1], using suitable sequences for the indices, 
such that 



PNix) r(x) - gix) . 



(78) 
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Since, by definition, Umjv— >oo '"Af — foot Eqs. (gD, (|73), and @ will then establish foo > 
sup3,g[o y(r(x) — g{x))^ from which, together with the upper bound in (pS]), the claim will follow. 

Note first that, for x = or j; = 1, pn{x) = PxN,o,o = — g{x) holds for arbitrary N . 
Now, fix X € ]0, 1[. If r is continuous in [x — d,x\ for a suitable d > 0, let k^^x) := [xN\, 
nT-Nix) = ld^/N\, and n]\;{x) = 0. Otherwise r is continuous in + for some d > 0, and we 
define k^ix) := \xN~\, m]\f{x) = 0, and ni\f{x) = [dy/N\. With these choices, the last term in 
( |77| ) vanishes for ^ cxd since mAr(x) + njv(x) oo, and the enumerator is bounded. So does 
the supremum term because of the uniform convergence 9- supy^j^ ^^^^ \g{x)—g]\f{x)\ < 



supj^g[o,i] Idix) — gN{x)\ — > 0. The latter follows from the uniform continuity of yvA since, in 



\g{x) - gNix)\ 



\Ju+{x - - yju+{x)^ (x) + \/n+(x) [\ju-{x + ^) - 



(79) 



the terms in parentheses vanish uniformly in x as A'^ — > oo and y^i^^^x) is bounded. Finally, the 
infimum term in (77), and thus yOAr(x), converges to r(x) — g(x) since ^^^^^{a;) — > r is continuous 
in all /at 9 x, and |/Ar| = {ra^{x) + nN{x))/N — > 0. This was to be shown. 

Now, let us prove that the ancestor distribution is concentrated around those x for which 
r(x) — g{x) is maximal, from which Eq. ( |3^ ) follows if the maximum is unique. Multiplying the 
evolution equation in ancestor form (^) by we obtain, after summation over k: 



rN 



N 



fc=o 



^Ju+{xk~l)u {xk)^Jakak-i + ^u+{xk)u {xk+i)y/ak+iak ■ (80) 
Using ^ayfeOfcii < ^(a^ + ak±i), we get 



TAT 



< ^ [r(xA;) - 5fAr(xfe)] ak = rN - {9n)n 



(81) 



fc=0 



with gN{xk) as defined in Section 4^. Since ^ r and gN{x) — > ^(x) uniformly in x G [0, 1], 
we can find for any given e > some N^, such that for all N > Nf,: 



N 



[r(xfc) - ^(xfc)] ak> r -e^ 



(82) 



fc=0 



We now divide this sum into two parts, := "^^ky +Sfc<- '^'^^ ^'^^^ part, X]fc>i collects all k 
with r(xfc) — 5(xfc) > f — e, the second part contains the rest. We then obtain 



N 



r -e < > ir^Xfcj - g{Xk 
k=o 



)]ak<r'^ak + {r -e)'^ak = r -e^ 



ak 



(83) 



and thus J2k< < £• We conclude that for sufficiently large, the ancestor distribution is 
concentrated in those mutation classes for which r(x) — g{x) is arbitrarily close to its maximum, 
f. 
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C Proofs from Section ^ 



C.l Proof of 

We first prove that the negation of ( |58| 



r"{x)-^-^f^<0, VXG [x^in,Xn,ax], (84) 

9 v^J 



implies (57) and is therefore a sufficient condition for the absence of a fitness threshold. We start 
by showing that both r and g are strictly decreasing in ]xmin, a^max[- To see this, suppose there 
exists an X > Xmm with r'{x) = 0, and let be the smallest such x. Then either g'{l,Xr) = 
and lim^^^a;^ (r"{x) — J = r"{xr) —r"{xr) = in contradiction to (^^, or g'{l,Xr) 7^ 0, 



in which case we obtain r"(xr) < in contradiction to r'{x) < for x £ ]xmin) Xr[- On the other 
hand, imagine g'{x) = for some x £ ]xjam, Xraax[, and let Xg be the largest such x. Then, since 
g'{x) < for X G ]xg,x^ax[, we have g"{xg) < and thus hnix^Xg 9"{x)/g' (x) = +00 for the 
right-sided limit, which again contradicts ( p^ ) since r'[xg) < 0. Therefore, //(x) := r'(x)/g''(l,x) 
is well-defined everywhere in ]xmin, a;max[, it guarantees r'(x) = g'{fi{x),x), and (|8^ yields 
r"{x) < g"{p{x),x), which completes the first part of the proof. 

We now prove that (]58| ) implies a threshold. Assume first that the supremum in (]58| ) is 
larger than zero. Due to the continuity of r, g, and their derivatives, we then find an xq in 
]xrrLin, XmnA with r" (xq) - r' {xo)g" (xq) / g' (xq) > 0. This, however, implies r"(xo) - ^"(^^o) > 
whenever r'(xo) — g'ixo) = 0. Therefore, the maximum of r(x) — g{x) is never attained at xq and 
we must have a jump in x(^). If the supremum in ( ^8|) is exactly zero, we argue as follows. For 
the absence of a threshold, we need a continuous function x(^) whose inverse, by the maximum 
principle, is /x(x) = r'(x)/g'(l, x) > for x G Jxmirn 3^max[- For the derivative of /m(x), we 
find /i'(x) = [r"(x) — r'{x)g"{x)/g'{x)]/g'{l,x), which must be non-negative in the absence of a 
threshold. Consider now those x at which the supremum in ( p8[ ) is attained. For g'(l,x) 7^ 0, 
we have /i'(x) = 0. Since x'(//) = l//^'(x), x(/i) has a diverging derivative at these points, and 
a jump if the supremum is attained (and thus fJ.'{x) = 0) on a whole interval. Finally, we also 
obtain a jump if the supremum is attained on an interval where also g'{l,x) = as then the 
whole interval is degenerate as a maximum. We exclude the special case that g'{l,x) = at an 
isolated x to avoid lengthy technicalities. 

C.2 Proof of (§T]) 

Note first that existence of a wildtype threshold obviously implies a lower bound of l/fJ-^ on the 
left hand side of (|6l|). Assume, on the other hand, that there are sequences Xj in [xmin, a^max] 
and /Uj > with /ij — > and r{xi) — fj,ig{l,Xi) > r(xinin) — Aii5(l, a^min) for all i. Let then 
Xj — > Xoo be a convergent subsequence. Since r and g are assumed to be continuous, we have 
i^ixoc) ^ ^(a^min) snd hence Xqo — a^min; since r(xmin ) is the unique maximum of r in [xmin, a^max]- 
Thus, we find 

g(l,Xj) -g(l,Xmin) ^ 1 _ 

f f ^ > ^ 00 , (85) 

contradicting (^l]) and proving the criterion. 
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C.3 Proof of (H) 

The proof is analogous to the case of the wildtype threshold. On the one hand, existence of 
the threshold implies the criterion with a bound /i^. On the other hand, if we have sequences 
Xi in [xmirna^max] and with r{xi) — fiig{l,Xi) > r(xmax) for fii — > oo, we can again choose a 
convergent subsequence xj — > Xoo- Since g{l,x) is continuous and Xmax is the only zero of g 
in [xmin, a^max], we have Xoo = Xmax- As in the wildtype case above, this contradicts (|6^) and 
proves the criterion. 
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